Skip to content

Downside to DNA, Part 2

March 24, 2016

Earlier, I linked to an article showing that because DNA is not handed down in even chunks from all your ancestors, you won’t match most of the people you’re related to on paper beyond about  fourth cousins.

False Matches

There’s another aspect to DNA testing which makes it not entirely reliable, which is that you almost certainly have a number of false matches in the database: people who seem to share DNA with you but who are not related to you in any genealogical sense.

They may be related to you in terms of coming from a common population, or they may not be related to you at all. Worse still, they may be related to you on paper, but the genetic link that appears to correspond to your common ancestors may be no link at all.

I don’t think I’m able to express this better than Dr. Ann Turner did in her wonderful post in the Journal of Genetic Genealogy, so I’ll send you there. But what’s worth noting, in brief, is that segment lengths of autosomal DNA, the DNA you inherit from all your ancestors, cannot be set with any considerable confidence without phasing, which is working out which letter of each two-part base pair of DNA comes from which parent.

The Math

John Walden and Dr. Tim Janzen have provided analysis of a goodly sized sample set of 9,000 haplotypes, which give probabilities that a match in a database survives phasing on both sides and therefore is likely IBD, identical by descent (meaning you share the match through a common ancestor), as opposed to IBS (identical by state, as Ann described in the article I linked to above).

There is always a trade-off in statistics between specificity – including all relevant examples – and confidence, that is, being sure the samples we have caught are relevant.

Now testing companies have a problem. They all have to set a threshold for genetic matches somewhere. Should they set the threshold high and leave out real cousins, or set it low, and include many non-cousins?

SegmentsRemovedDualPhasing

 

{EDIT: Corrected values and graph added April 11, 2016.}
I’ve done a very simple logistic regression analysis on the data. What this S-curve in fact shows is that there is a steep climb from the range of about 5 centimorgans (cM; the centimorgan is a unit of the probability of recombination occurring in DNA), where the probability of a match surviving phasing is just over 13%, to 9 cM, where the probability is around 78%. For stats wonks, this regression was done in R.

In brief, what this means is the testing companies, if they are using unphased data, have to include a whole bunch of matches in that steep climb. Many and perhaps most of your matches will in fact fall into the “grey zone.”

23andme

23andme sets its lowest segment at 7.0 cM, at which there is a 42.6% chance the match survives phasing.

Unfortunately, with endogamous populations or those that underwent a founder effect (and the 23andme database has strong representation from at least two such populations: people with early New England descent, and Ashkenazi Jews), the majority of one’s matches are likely to be false, because 7.0 cM fits with a less-than-even probability of a match being IBD. When you add to this that 23andme has a cap on Relative Finder matches of 1,500, and that they populate one’s Relative Finder list by total centimorgans shared rather than by long blocks, this makes 23andme less desirable as a genealogical tool for those with significant early American, Ashkenazi, or otherwise bottlenecked ancestry (such as French Canadians).

Family Tree DNA

Ancedotally, Family Tree DNA‘s Family Finder appears set to have a smallest segment of 7.69 cM, which offers a 56% probability of a match surviving phasing, according to the model. This seems to me to strike a better balance, but even then one must be leery of shorter matches.

DNA.Ancestry.com

Ancestry DNA is a little trickier to compare. At first I couldn’t find anything on their methodology, but on a tip, I was pointed to this.

The upshot is, they phase short segments, so one will be less likely to come up with matches by chance, but may still have many ancient population-level matches.

They still do not have a chromosome browser, so it is impossible to verify that a putative match overlaps others who share that ancestral line.

Conclusions

However you slice it, though, a number of your matches will be identical by state and will not represent a genealogical ancestor. For a breakdown of how IBS could mean either identical by chance, or identical by population (particularly relevant to close-knit or bottlenecked populations), I refer you to a recent blog by Roberta Estes.

Advertisements
One Comment leave one →
  1. April 10, 2016 5:13 pm

    A brief update: Blaine Bettinger made pretty well the exact same point on his blog:

    http://thegeneticgenealogist.com/2014/12/02/small-matching-segments-friend-foe/

    Further, it’s been brought to my attention Ancestry released their methodology here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: