Y-chromosome haplotree growth

World’s largest Y-DNA Haplotree from FamilyTreeDNA (Image generated using iTOL; FTDNA Blog)

Male part of humankind has been having an important role in identification of people by their origin and belonging. Often they were seeking for a new land, trade or better resources, while women and children stayed in camps. Therefore, they were more mobile also in the time when bare foot was the only means of travelling. Those facts are important for understanding the meaning and interpretation of human haplotree, which is created based on testing of Y-chromosome (attribute of man).

Family Tree DNA is a pioneer in DNA testing specialized for genealogy (hosting a Slovenian origin project among hundreds of others), starting with commercial test soon after the human genome was first sequenced in 2003 (The gaps of some DNA parts have been sequenced only recently). Family Tree DNA is actually a pioneer of genetic genealogy with systematically established the world’s largest DNA database for genealogy, comprising not only atDNA, but also Y-chromosome and mitochondrial DNA records. Nowadays, their partner Myheritage, with which they share the testing laboratory, has larger database with atDNA results, but Family Tree DNA remained the leading company with their Y-chromosome and mitochondrial DNA records, which give specific ancestry information of direct lines. You are in the best hands when you test with MyHeritage and Family Tree DNA, as the atDNA results are exchangeable and both interprete the results obtained. You have to bear in mind only that your biological sample sent to Family Tree DNA for atDNA testing (Family Finder), will be kept for further purchases of tests like mtDNA and Y-chromosome DNA. If you send it to MyHeritage, it will be discharged after obtaining their results.

Family Tree DNA is well known for their work with National Geographic on the Genographic Project (2005 – 2019). This was the project with first attempt to build the humankind family tree and interpret migrations for last 200,000 years based on analysis of STR markers on Y-chromosome. I was inspired by book The journey of man (Spencer Wells), which introduced the DNA testing and haplogroups to me. At that time they tested very little on SNP markers (single nucleotid polymorphism) and focused more on STR markers (short tandem repeats).

SNP as genetic marker is a change in the sequence of nucleotides in a DNA chain that indicates a particular phenotype (like hair colour). In genealogical genetic analyses, they determine the matching of SNP genetic markers among descendants (SNP means change in a single letter of DNA alphabet A, C, G, T). A genetic marker means a physical location on a chromosome (locus). Also STRs are used as genetic markers. For example the Y-DNA67 test is a sequence of 67 markers with short tandem repeats (STRs). To learn more about how testing of Y-chromosome evolved over time watch the video from RootsTech presentation.  

The exponential growth of Y haplotree is due to new generation technology, which is commonly named as “Big Y” and has been introduced by Family Tree DNA in November 2013 (See the table below).

FTDNA Y haplotree in 2010 and 2020 (It cannot be shown on a poster anymore). Source – FTDNA at RootsTech.

Big Y analysis has changed the accuracy and level of testing of Y-DNA. However, the raw data results would not help much an average user due to lack of knowledge and equipment. We need strong scientific support of the Family Tree DNA at interpreting the testing results. Even the determination of haplogroup does not happen only automatically.  The SNP markers are carefully examined by a specialist Michael Sager who also manages the Y haplotree. He assesses incoming datasets, which are usually from Big Y results, but may be from other sources, such as academic research, and carefully builds out branches on the tree. At Family Tree DNA they say that it is a one-person job and not a role that can be filled by anyone else. For more information on that process, watch this video from RootsTech. When Big Y tests are completed, they receive the automated haplogroup assignment based on the existing tree, but no new branches will be added to the tree, and no haplogroups will be updated until Sager’s intervention. So, pay attention to your Big Y results: Your haplogroup can be updated several times during years, as the database of testers grow and new information become available. Details for some haplogroups and lineages are still missing, so those, who have tested only markers Y-37, Y-67, or similar are encouraged to upgrade to Big Y analysis to get a better placement on the global Y haplotree. Remember, that a SNP must be observed in at least two related man before it can be uploaded into haplotree as a new branch. So, unless you would like to stay with your “private variants”, contact your Y-37 matches (or similar STR results) and invite them to invest to their Big Y – to place themselves into the tree and also contributing to general benefit of humankind.

The exponential growth of Y haplotree through last 20 years is shown in the table below (Source: FTDNA at Rootstech conference).

Year of developmentNumber of variantsDeveloper of Y-haplotree
2002245 SNPsY Chromosome Consortium Tree
2006436 SNPsISOGG tree
2008790 SNPsISOGG tree
2010935 SNPsISOGG tree
20122067 SNPsISOGG tree
2013, September3610 SNPsISOGG tree
2014, April6200 SNPsBig Y introduced in NOV 2013
2016, November23,767 SNPsBig Y
2017, November58,590 SNPsBig Y-500
2018, May100,000 SNPsBig Y-500
2019, January200,000 SNPs, 20k branchesBig Y-700 released, 50% more SNP coverage
2020, December350,000 SNPs, 30k branchesBig Y-700
2021, May400,000 SNPs, 40k branchesBig Y-700
2022, 1st of January470,000 SNPs, 50k branchesBig Y-700

A Big Y analysis was a milestone in the Y-chromosome testing and building the global Y-haplotree. However, the genealogysts know very well, that the information on direct paternal line is very little piece of ancestry compared to all other ancestry lines. Therefore, a testing of all other chromosomes gives much more information on recent relatives of last 6 generations and also from paleo-ancestry. It is very interesting to compare our results with archeological samples and read interpretation of MyOrigins how parts of our genom show ancient ethnicities and their migrations.

However, the next milestone has been reached recently by international team of scientists, who combined genetic reports of 3,609 individual genome sequences from 215 populations around the globe to produce the largest family tree ever – it identifies nearly 27 million ancestors and origins where they lived, dating back more than 100,000 years ago (Source: A unified genealogy of modern and ancient genomes/Oxford University Big Data Institute). The authors conclude: “Whole-genome genealogies provide a powerful platform for synthesizing genetic data and investigating human history and evolution”.

Which DNA testing company to use for genealogy?

It is not all in the size of a database of service providers. Good tools for genealogy matter. See how to make the DNA testing services work for your goals in genealogy.

Recently, I have commented the analysis of Family History Fanatics about “What is the Best DNA Testing Company for Genetic Genealogy Research?”, which they usually prepare at the beginning of the year. I though it would be nice also to put here some of my experiences gained so far in genetic genealogy. They scored different features and rated the best companies by the main criteria of the database size for matching.

Scoring of Andy Lee gives the first place to Ancestry, followed by MyHeritage, GEDmatch, 23andMe, FamilyTreeDNA and LivingDNA. The main criteria was the size of users database, which has increased significantly in the last year in companies at the first and the second place (Source: Youtube, Family History Fanatics)

I was glade to read another comment with conclusion that “having all three – FTDNA, Ancestry and GEDmatch is probably ultimately better than only having one”, as I shared their view completely. It encouraged me to add to these three also the fourth one: MyHeritage. I use all four of them to benefit of their best features. I recommended doing so also to my colleagues from Slovenian Genealogy Society and other genealogists who joined my Club of Genetic Genealogy on Wednesday, 27 of January 2021.

Here is my experience, how to include a DNA testing as a tool to your genealogy research:

1. I tested the atDNA for several people at MyHeritage, where these results live their own life in matching, as all tools are built in and shown to the users in friendly way to explore linked matches and their family trees. I especially love their new ethnicity origin estimates.

2. Then I exported data of DNA testing from MyHeritage to FamilyTreeDNA, as they have the same good tools for comparissons as the Gedmatch. The tools are built into the system for simple use of the donors of samples, who are not experts in genetic genealogy.

3. The size of FTDNA database I incerase by exporting the raw data to GEDmatch and I do analysis there (at least One-to-Many and then One-to-one for the best matches). As I am from the EU, I appreciate data protection compliance (GDPR) of both, FTDNA and Gedmatch.

4. In FamilyTreeDNA I have organized a country-wide project and a surname project of all tested people of this origin or surname. This is a unique tool among all service providers, which enables citizens’ science and further genealogy research. As one of the Admins, I can help the other 200 members to improve their pedigree charts or do additional testing on Y-chromosome and mtDNA.

5. The FTDNA has improved their genealogy part with myTree recently, where they show Shared Origins of tested ethnicity, as well as the haplogroups of Y-chromosome and mtDNA, linked to the profile with ancestral surnames and places of origin. A wonderfull identity card of MRCA also for post mortal times… And there is no subscription for my account at FTDNA – all is paid by the tests ordered.

6. The size of the database is indeed important for matching, but also FTDNA has a size big enough for successful start, especially for those of European origin. I spent two years researching my matches there. If I find a surname, origin or other data match in other systems of 23andMe, Ancestry or MyHeritage with atDNA test, I invite them to import to FTDNA and join our country or surname project. They do not need to test again, only unlock the tools available inside for comparison. Later, when they become interested in, they usually buy a Y-chromosome test (for man only) and a mitochondrial DNA test (for anyone) to place themselves into deep history of paternal and maternal lines and onto phylogenetic trees.

7. In December 2020, I bought a subscription at Ancestry and then ordered also a DNA test to find my remote cousins whose ancestors went for better life over the Ocean before WWI. My results at Ancestry have not yet been ready, but I am really looking forward to fishing in their big DNA pool.

8. Last year I have discovered also the fifth company, which I use now for Y-haplotree matching and mtDNA-haplotree matching in the period of 3000 years before past to 1600 AD: MyTrueAncestry. Just try to export your atDNA results from any of your favorite testing companies and import to MyTrueAncestry – one sample you compare for free. Voila, incredible personal history is in front of you….

So, it is not all in the size of a database of service providers. Good tools for genealogy matter. We need to make those testing services work for our goals in genealogy 🙂