Start here
The basics, without the jargon.
Genes, SNPs, alleles, what binds to them, and haplogroups — explained at a level that makes the rest of this site make sense.
What is a Gene?
Your DNA is a molecule shaped like a twisted ladder — the famous double helix — and it runs about three billion rungs long. Stretched out, the DNA in a single human cell would be roughly two metres. It's packed into 23 pairs of chromosomes, tucked into the nucleus of almost every cell in your body.
A gene is a specific stretch of that DNA that carries the instructions for making something useful — usually a protein. Proteins do almost everything in the body: they form structures, carry molecules, speed up chemical reactions, and regulate other genes. You have roughly 20,000–25,000 genes, though they only account for about 2% of your total DNA. The rest regulates them, or has functions we're still figuring out.
Think of it this way: if your DNA is a library, chromosomes are the bookshelves, and genes are individual chapters. Each chapter contains the blueprint for one specific protein. You carry two copies of every chromosome (one from each parent), which means two copies of every gene. Those two copies aren't always identical — and that's where things get interesting.
What is an SNP?
DNA is written in a four-letter alphabet: A, T, C, and G (the nucleotide bases). A Single Nucleotide Polymorphism — pronounced "snip", abbreviated SNP — is a single-letter swap at one position in the sequence. For example, where most people have a C, you might have a T. That one-letter difference is an SNP.
SNPs are incredibly common. Humans differ from each other at roughly 4–5 million positions across the genome. Most of those differences are completely silent — they sit in regions that don't code for anything critical, or the swap doesn't change how the resulting protein folds. But some SNPs matter a great deal. The MTHFR C677T variant (rsID: rs1801133), for example, is a single C→T swap that reduces the enzyme's efficiency by 30–70%, affecting how your body processes folate.
Each SNP has a unique reference ID called an rsID (reference SNP ID), assigned by the NCBI dbSNP database. These look like rs1801133, rs1805087, rs4846048. They're how researchers and databases refer to specific positions in the genome consistently across studies.
What is an Allele?
An allele is the specific version of a gene — or a specific letter at a position — that you actually carry. Because you have two copies of every chromosome, you have two alleles at every position: one inherited from your mother, one from your father.
If both copies are the same (e.g. CC at a given SNP position), you're homozygous at that position. If they differ (e.g. CT), you're heterozygous. This matters because many variants only show significant effects when homozygous — carrying two copies of the less common allele. Heterozygous carriers often land somewhere in between.
The diagram above illustrates how dominant and recessive alleles interact. A dominant allele expresses itself even when only one copy is present. A recessive allele only shows up when both copies carry it. Most SNPs studied in modern genetics don't follow simple dominant/recessive rules — they're additive, meaning each copy of the variant adds a small amount of effect, or codominant, where both alleles contribute.
What Binds to Alleles?
DNA isn't just passively sitting there. It's constantly being read, regulated, switched on and off. Several types of molecules interact with it directly.
Transcription factors are proteins that bind to specific DNA sequences near a gene and either activate or suppress its expression. Think of them as switches. A small change in the binding site sequence — from an SNP, for example — can mean a transcription factor no longer recognises it, changing how much of a protein gets made.
Methylation is a chemical modification where a methyl group (CH₃) is attached directly to a cytosine base in the DNA. Methylated regions are generally silenced — the gene doesn't get read. This is part of epigenetics: changes in gene expression that don't alter the underlying DNA sequence. Crucially, methylation depends heavily on folate metabolism. Folate (vitamin B9) feeds into a pathway that produces SAM (S-adenosylmethionine), the universal methyl donor. If your MTHFR or MTR enzymes are impaired, SAM production drops and methylation across the whole genome is affected.
Hormones such as oestrogen, testosterone, and cortisol bind to receptor proteins that then act as transcription factors. This is why hormone levels can have broad effects on gene expression, and why SNPs in receptor genes (like oestrogen receptor alpha, ESR1) can alter sensitivity to hormonal signals.
Drugs and antifolates work by mimicking or blocking the natural molecules that bind to gene products. Methotrexate, for instance, blocks DHFR (dihydrofolate reductase) by fitting into the same binding site as dihydrofolate — jamming the folate cycle and stopping rapidly dividing cells. Your SNPs in DHFR, TYMS, and related genes affect how well (or poorly) antifolate drugs work in your body.
What is a Haplogroup?
As humans migrated out of Africa over tens of thousands of years, populations became geographically separated. Over time, each group accumulated its own set of SNPs — small mutations that arose and spread within that group but not others. A haplogroup is a cluster of such SNPs that are reliably inherited together (because they sit close together on the chromosome and rarely get separated by recombination). It acts as a genetic fingerprint for an ancestral lineage.
There are two main types. Mitochondrial haplogroups trace your maternal line: mitochondrial DNA is passed only from mother to child, so it accumulates mutations linearly through the maternal lineage without being shuffled by recombination. Mitochondrial haplogroups have letter-based names like H, J, T, U — H is the most common in Europeans. Y-chromosome haplogroups trace the paternal line in men, in the same way.
Where haplogroups become relevant to functional genetics is in population frequency data. An SNP that's common in one haplogroup may be rare in another, which affects how we interpret its significance. This is why this site uses Caucasian population frequencies from NCBI as the reference — it's the population context that matches the data.