Y-chromosome haplotree growth

World’s largest Y-DNA Haplotree from FamilyTreeDNA (Image generated using iTOL; FTDNA Blog)

Male part of humankind has been having an important role in identification of people by their origin and belonging. Often they were seeking for a new land, trade or better resources, while women and children stayed in camps. Therefore, they were more mobile also in the time when bare foot was the only means of travelling. Those facts are important for understanding the meaning and interpretation of human haplotree, which is created based on testing of Y-chromosome (attribute of man).

Family Tree DNA is a pioneer in DNA testing specialized for genealogy (hosting a Slovenian origin project among hundreds of others), starting with commercial test soon after the human genome was first sequenced in 2003 (The gaps of some DNA parts have been sequenced only recently). Family Tree DNA is actually a pioneer of genetic genealogy with systematically established the world’s largest DNA database for genealogy, comprising not only atDNA, but also Y-chromosome and mitochondrial DNA records. Nowadays, their partner Myheritage, with which they share the testing laboratory, has larger database with atDNA results, but Family Tree DNA remained the leading company with their Y-chromosome and mitochondrial DNA records, which give specific ancestry information of direct lines. You are in the best hands when you test with MyHeritage and Family Tree DNA, as the atDNA results are exchangeable and both interprete the results obtained. You have to bear in mind only that your biological sample sent to Family Tree DNA for atDNA testing (Family Finder), will be kept for further purchases of tests like mtDNA and Y-chromosome DNA. If you send it to MyHeritage, it will be discharged after obtaining their results.

Family Tree DNA is well known for their work with National Geographic on the Genographic Project (2005 – 2019). This was the project with first attempt to build the humankind family tree and interpret migrations for last 200,000 years based on analysis of STR markers on Y-chromosome. I was inspired by book The journey of man (Spencer Wells), which introduced the DNA testing and haplogroups to me. At that time they tested very little on SNP markers (single nucleotid polymorphism) and focused more on STR markers (short tandem repeats).

SNP as genetic marker is a change in the sequence of nucleotides in a DNA chain that indicates a particular phenotype (like hair colour). In genealogical genetic analyses, they determine the matching of SNP genetic markers among descendants (SNP means change in a single letter of DNA alphabet A, C, G, T). A genetic marker means a physical location on a chromosome (locus). Also STRs are used as genetic markers. For example the Y-DNA67 test is a sequence of 67 markers with short tandem repeats (STRs). To learn more about how testing of Y-chromosome evolved over time watch the video from RootsTech presentation.  

The exponential growth of Y haplotree is due to new generation technology, which is commonly named as “Big Y” and has been introduced by Family Tree DNA in November 2013 (See the table below).

FTDNA Y haplotree in 2010 and 2020 (It cannot be shown on a poster anymore). Source – FTDNA at RootsTech.

Big Y analysis has changed the accuracy and level of testing of Y-DNA. However, the raw data results would not help much an average user due to lack of knowledge and equipment. We need strong scientific support of the Family Tree DNA at interpreting the testing results. Even the determination of haplogroup does not happen only automatically.  The SNP markers are carefully examined by a specialist Michael Sager who also manages the Y haplotree. He assesses incoming datasets, which are usually from Big Y results, but may be from other sources, such as academic research, and carefully builds out branches on the tree. At Family Tree DNA they say that it is a one-person job and not a role that can be filled by anyone else. For more information on that process, watch this video from RootsTech. When Big Y tests are completed, they receive the automated haplogroup assignment based on the existing tree, but no new branches will be added to the tree, and no haplogroups will be updated until Sager’s intervention. So, pay attention to your Big Y results: Your haplogroup can be updated several times during years, as the database of testers grow and new information become available. Details for some haplogroups and lineages are still missing, so those, who have tested only markers Y-37, Y-67, or similar are encouraged to upgrade to Big Y analysis to get a better placement on the global Y haplotree. Remember, that a SNP must be observed in at least two related man before it can be uploaded into haplotree as a new branch. So, unless you would like to stay with your “private variants”, contact your Y-37 matches (or similar STR results) and invite them to invest to their Big Y – to place themselves into the tree and also contributing to general benefit of humankind.

The exponential growth of Y haplotree through last 20 years is shown in the table below (Source: FTDNA at Rootstech conference).

Year of developmentNumber of variantsDeveloper of Y-haplotree
2002245 SNPsY Chromosome Consortium Tree
2006436 SNPsISOGG tree
2008790 SNPsISOGG tree
2010935 SNPsISOGG tree
20122067 SNPsISOGG tree
2013, September3610 SNPsISOGG tree
2014, April6200 SNPsBig Y introduced in NOV 2013
2016, November23,767 SNPsBig Y
2017, November58,590 SNPsBig Y-500
2018, May100,000 SNPsBig Y-500
2019, January200,000 SNPs, 20k branchesBig Y-700 released, 50% more SNP coverage
2020, December350,000 SNPs, 30k branchesBig Y-700
2021, May400,000 SNPs, 40k branchesBig Y-700
2022, 1st of January470,000 SNPs, 50k branchesBig Y-700

A Big Y analysis was a milestone in the Y-chromosome testing and building the global Y-haplotree. However, the genealogysts know very well, that the information on direct paternal line is very little piece of ancestry compared to all other ancestry lines. Therefore, a testing of all other chromosomes gives much more information on recent relatives of last 6 generations and also from paleo-ancestry. It is very interesting to compare our results with archeological samples and read interpretation of MyOrigins how parts of our genom show ancient ethnicities and their migrations.

However, the next milestone has been reached recently by international team of scientists, who combined genetic reports of 3,609 individual genome sequences from 215 populations around the globe to produce the largest family tree ever – it identifies nearly 27 million ancestors and origins where they lived, dating back more than 100,000 years ago (Source: A unified genealogy of modern and ancient genomes/Oxford University Big Data Institute). The authors conclude: “Whole-genome genealogies provide a powerful platform for synthesizing genetic data and investigating human history and evolution”.