Home » evolution » Demolition of the ENCODE project defining function in 80% of the human genome

Demolition of the ENCODE project defining function in 80% of the human genome

ENCODE logo - the 2012 paper is discredited

ENCODE logo – the 2012 paper is discredited

Dan Graur and colleagues have just published a damning criticism of a remarkable paper on the human genome that summarized a vast research project (perhaps costing more than $250 million) co-authored by no less than about 476 scientists. The September 2012 paper, called ENCODE for ENcyClopedia Of DNA Elements,  has a succinct abstract claiming function for 80% of the 3.2 billion nucleotides in the human genome. (Another 29 publications appeared from subsets of the authors at that time, and others still continue to be published.)

I was sceptical of the nature of the analysis, although at the time too ill to do anything about it. With a focus of my lab’s work on repetitive DNA evolution, and my advantages of working with several species and, for each species, the near and more distant relatives, I could not understand how, within the ENCODE paper, there could be no mention of tandem repeats or tandemly arrayed DNA sequences, of transposons, LINEs, SINEs, ALU elements, rDNA repeats, a single mention of retrotransposons (in a section comparing human and other primates), and no discussion of other key and abundant DNA features of the genome, let alone the repetitive DNA associated with structural chromosome components, centromeres or telomeres. Together, these classes of repetitive element represent something like half or more of the DNA in the genomes of all organisms with genomes larger than the tiniest (c. <200 million bp, Mbp), so it seems hard to understand how these can go unmentioned in the main ENCODE project report. Furthermore, many of these elements do not follow the (for most genes and regulatory sequences at least) conventional evolutionary pathways, but change in copy number by unequal crossing over, slippage replication and other mechanisms, while also showing extensive within- and between-chromosome homogenization of sequence.

The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. http://www.nature.com/nature/journal/v489/n7414/pdf/nature11247.pdf

Now, Dan Graur, four colleagues from the University of Houston (http://nsm.uh.edu/~dgraur/) and Eran Elhaik from John Hopkins University (http://eelhaik.aravindachakravartilab.org/) have published a report pointing out substantial fallacies in the arguments and definitions used in the ENCODE paper: redefining words (“using the wrong definition wrongly”), using logical fallacies including “affirming the consequence”, equating A being a subset of B as meaning B has the properties of A, misuse of population genetics and evolutionary concepts, and other methodological errors. They also note the numerous press conferences and public relations  activities that were associated with the publication of the papers, and the fact that ENCODE says they “assign biochemical functions for 80% of the genome”, while investigators from the project quote many other figures ranging from 20%, through 40% and upwards (Graur et al. noting “Unfortunately, neither 80% nor 20% are based on actual evidence”)!

Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall, R. A., & Elhaik, E. (2013). On the immortality of television sets:  “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution. Preprint: URL http://dx.doi.org/10.1093/gbe/evt028

Abstract: A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. http://www.nature.com/nature/journal/v489/n7414/pdf/nature11247.pdf


The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is
unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription,
transcription factor association, chromatin structure and histone modification. These data enabled us to assign
biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many
discovered candidate regulatory elements are physically associated with one another and with expressed genes,
providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical
correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation.
Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an
expansive resource of functional annotations for biomedical research.


Any adverts below not associated with site and not returning money to me (except I don’t pay for wordpress)



  1. themayan says:

    Just reading the abstract alone sounded more like a hit piece than professional scientific journalism. The mean spirited tone reeked of anger and bias.

    As I read further, I was surprised to find the authors paraphrasing Frank Zappa. Don’t get me wrong, I loved Zappa, but I think even he would say that it would be very silly to use any of his utterances in a science journal, and especially one which seems to be more personalized than unbiased.

    “Data is not information, information is not knowledge, knowledge is not wisdom, wisdom is not truth,” —Robert Royar (1994) paraphrasing Frank Zappa’s (1979) anadiplosis

    I also found it interesting that they quoted T. R. Gregory who who is critical of ENCODE for completely different reasons. According to Gregory, we supposedly knew about function decades ago and that this should be so no big surprise. Of course as I said before, I had to remind him that maybe one of the problems laid in the fact that many scientist ignored this data (as they should have just stuck to science and not get involved in the culture war) as it is well document that many instead, held this useless junked DNA paradigm as a poster child for bad design with all this supposed empirical evidence to back it up. Like many others, Gregory is of the sort that follows the logic, that if the data is incongruent to the theory, then the data must be wrong as he speaks of his “onion test” concerning C Value paradox below.

    “The onion test is a simple reality check for anyone who thinks they can assign a function to every nucleotide in the human genome. Whatever your proposed functions are, ask yourself this question: Why does an onion need a genome that is about five times larger than ours?” —T. Ryan Gregory”

    Dan Graur 
”playing fast and loose with the term “function,” by divorcing genomic analysis from its evolutionary context and ignoring a century of population genetics theory”….

    Dan maybe its time to update these 80 year old constructs. As this paper below which is one of many indicates……

    The new biology: beyond the Modern Synthesis Michael R Rose1* and Todd H Oakley2 . The last third of the 20th Century featured an accumulation of research findings that severely challenged the assumptions of the “Modern Synthesis” which provided the foundations for most biological research during that century. The foundations of that “Modernist” biology had thus largely crumbled by the start of the 21st Century. This in turn raises the question of foundations for biology in the 21st Century. .

Dan Graur 
”There are two almost identical sequences in the genome. The first, TATAAA, has been maintained by natural selection to bind a transcription factor, hence, its selected effect function is to bind this transcription factor. A second sequence has arisen by mutation and, purely by chance, it resembles the first sequence; therefore, it also binds the transcription factor. However, transcription factor binding to the second sequence does not result in transcription, i.e., it has no adaptive or maladaptive consequence. Thus, the second sequence has no selected effect function, but its causal role function is to bind a transcription factor”

    Here is what ENCODE’s lead analysis coordinator E. Birney says about this….
”Rather than being inert, the portions of DNA that do not code for genes contain about 4 million so-called gene switches, transcription factors that control when our genes turn on and off and how much protein they make, not only affecting all the cells and organs in our body, but doing so at different points in our lifetime. Somewhere amidst that 80% of DNA, for example, lie the instructions that coax an uncommitted cell in a growing embryo to form a brain neuron, or direct a cell in the pancreas to churn out insulin after a meal, or guide a skin cell to bud off and replace a predecessor that has sloughed off”

    Dan Graur 
”The human genome is rife with dead copies of protein-coding and RNA-specifying genes that have been rendered inactive by mutation. These elements are called pseudogenes (Karro et al. 2007). Pseudogenes come in many flavors (e.g., processed, duplicated, unitary) and, by definition, they are nonfunctional”

    Not according to paper below…..
    PSEUDOGENES: Are They “Junk” or Functional DNA? Annual Review of Genetics
    Vol. 37: 123-151 (Volume publication date December 2003)
    First published online as a Review in Advance on June 25, 2003
    DOI: 10.1146/annurev.genet.37.040103.103949″Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes. It is therefore assumed that all pseudogene mutations are selectively neutral and have equal probability to become fixed in the population. Rather, pseudogenes that have been suitably investigated often exhibit functional roles, such as gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over nonsynonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles”…..

    It seems the biggest criticism in this paper is in how the the word function is used, as its definition of function is used broadly, but it also seems kind of silly to not expect such a broad definition when the findings themselves are so broad. And again just because the findings seem incongruent to how we view selection based on the modern synthesis (and or what Stewart Newman refers to as these old entrenched dogmas) it does not mean the theory should trump scientific revelation & the discovery of new and empirical data. Maybe it’s the theory that needs changing. One very well known scientist once told me. Scientist don’t change their minds, they just die.

    • Thanks TheMayan for some valuable, and referenced, comments. Clearly, Dan Graur’s paper is not couched in the measured tones found in most refereed papers – but I think that the ENCODE paper also marked a new extreme in hype and PR over what has been published before.
      With respect to “function”, the ENCODE authors rather imply they are the first people to have thought that 80% of the DNA is doing something in the human genome. They even add the gratuitous word “biochemical” to function in their abstract to emphasize the point. I can see no rationale by ENCODE for the 80% figure: 100% of the DNA is arguably “functional”, because it is acted on by replication enzymes, bound by histone proteins, and inherited. There is no clear statement as to what is the 20% ENCODE consider non-functional.

      I find it hard, though, to accept the long extension of “function”. DNA transposons and retrotransposons, nearly half the human genome, are sometimes transcribed, are disease markers or can cause disease, but to imply that the majority, of ALU sequences for example, have a ‘biochemical function’ is playing very loosely with words. Similarly with the satellite sequences whether intercalary or at centromeres, or telomeric sequences..

  2. themayan says:

    It seems to me (and based on my understanding) that the consortium went to great lengths and pains in setting up a system where grandstanding was discouraged, and where instead, an accurate collective gathering and analyzing of information was the goal, (at least during this phase)

    Furthermore, in the past few years there seem to be many other independent papers, even before the release of the embargo that also seem to refute many of these old entrenched ideas concerning ncDNA/junkDNA. And there also seems to be no evidence of that train slowing down or hitting a brick wall anytime soon.

    Since this is a new emerging field, and where the money and resources invested remain unrivaled, as well as the fact that some of the brightest minds admit to not fully understanding much of this new data, then I think only time will tell.

    I think criticism is a good thing, but I also think that like many other criticisms in the blogosphere, this one seemed to be kind of personal and infused with a little more than just scientific criticism, but like the old saying goes. At least we can agree to disagree. Thanks for letting me express my two cents.

  3. […] Dan Graur and colleagues have just published a damning criticism of a remarkable paper on the human genome that summarized a vast research project (perhaps costing more than $250 million) co-author…  […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

MolecularCytogenetics.Com What’s new?

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

%d bloggers like this: