Demolition of the ENCODE project defining function in 80% of the human genome

ENCODE logo - the 2012 paper is discredited

ENCODE logo – the 2012 paper is discredited

Dan Graur and colleagues have just published a damning criticism of a remarkable paper on the human genome that summarized a vast research project (perhaps costing more than $250 million) co-authored by no less than about 476 scientists. The September 2012 paper, called ENCODE for ENcyClopedia Of DNA Elements,  has a succinct abstract claiming function for 80% of the 3.2 billion nucleotides in the human genome. (Another 29 publications appeared from subsets of the authors at that time, and others still continue to be published.)

I was sceptical of the nature of the analysis, although at the time too ill to do anything about it. With a focus of my lab’s work on repetitive DNA evolution, and my advantages of working with several species and, for each species, the near and more distant relatives, I could not understand how, within the ENCODE paper, there could be no mention of tandem repeats or tandemly arrayed DNA sequences, of transposons, LINEs, SINEs, ALU elements, rDNA repeats, a single mention of retrotransposons (in a section comparing human and other primates), and no discussion of other key and abundant DNA features of the genome, let alone the repetitive DNA associated with structural chromosome components, centromeres or telomeres. Together, these classes of repetitive element represent something like half or more of the DNA in the genomes of all organisms with genomes larger than the tiniest (c. <200 million bp, Mbp), so it seems hard to understand how these can go unmentioned in the main ENCODE project report. Furthermore, many of these elements do not follow the (for most genes and regulatory sequences at least) conventional evolutionary pathways, but change in copy number by unequal crossing over, slippage replication and other mechanisms, while also showing extensive within- and between-chromosome homogenization of sequence.

The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74.

Now, Dan Graur, four colleagues from the University of Houston ( and Eran Elhaik from John Hopkins University ( have published a report pointing out substantial fallacies in the arguments and definitions used in the ENCODE paper: redefining words (“using the wrong definition wrongly”), using logical fallacies including “affirming the consequence”, equating A being a subset of B as meaning B has the properties of A, misuse of population genetics and evolutionary concepts, and other methodological errors. They also note the numerous press conferences and public relations  activities that were associated with the publication of the papers, and the fact that ENCODE says they “assign biochemical functions for 80% of the genome”, while investigators from the project quote many other figures ranging from 20%, through 40% and upwards (Graur et al. noting “Unfortunately, neither 80% nor 20% are based on actual evidence”)!

Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall, R. A., & Elhaik, E. (2013). On the immortality of television sets:  “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution. Preprint: URL

Abstract: A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

The ENCODE Project Consortium 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74.


The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is
unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription,
transcription factor association, chromatin structure and histone modification. These data enabled us to assign
biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many
discovered candidate regulatory elements are physically associated with one another and with expressed genes,
providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical
correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation.
Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an
expansive resource of functional annotations for biomedical research.


Any adverts below not associated with site and not returning money to me (except I don’t pay for wordpress)

Centromeric tandem repeats: features common to 282 plant and animal species

Tandem Repeats at Centromeres

Tandem Repeats at Centromeres

“It is important to have the validation (or not) of many concepts about satellite DNA evolution developed in the pre-genomic era, now testing whole genome sequences from a large survey of different species and repeat families all analysed comparatively with different sequencing strategies in a single and massive work.” writes Guto Kuhn at Universidade Federal de Minas Gerais, Belo Horizonte, Brasil, a collaborator with the molecular cytogenetics group – in pointing out an important paper from Melters et al. “In this paper, they analysed centromeric tandem repeats from 282 species (animals and plants) and included Long Pacific Biosciences reads into their analysis. I found particularly interesting the use of PacBio reads to undestand homogenization of repeat variants, including verification of higher order organization. It is important to have validation in a single work with a large survey of different species and repeat families. There is a very informative video showing a talk by Simon Chan explaining their work:

Melters, D., Bradnam, K., Young, H., Telis, N., May, M., Ruby, J., Sebra, R., Peluso, P., Eid, J., Rank, D., Garcia, J. F., DeRisi, J., Smith, T., Tobias, C., Ibarra, J. R., Korf, I., & Chan, S. (2013). Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biology, 14 (1), R10+. URL or

Sadly, the last author, Simon W -L Chan, passed away in August 2012, from an autoimmune liver disease. I heard the tragic news while I was also lying is hospital two months after coming down with hepatitis and similar symptoms. Simon’s outstanding work will live on forever, and I hope our lab can build on his remarkable discoveries and experiments not only regarding repetitive DNA, but also about generation of haploid plants by chromosome elimination using mutations in CENH3 gene (Ravi, M., & Chan, S. W. L. (2010). Haploid plants produced by centromere-mediated genome elimination. Nature, 464 (7288), 615-618. URL I am honoured to have known him and been able to discuss research with him in the past.

Local link to provisional PDF/online-first (please check if the final paginated version appears)


Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.RESULTS:Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.CONCLUSIONS:While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.

Adverts not related and give no funding to this website.

Land use, productivity and environmental impacts

Ian Crute - Talk on land use, productivity and sustainability

Ian Crute – Talk on land use, productivity and sustainability

Ian Crute, Scientific Director of Agriculture and Horticulture Development Board (AHDB) and former Director of both Rothamsted Research and Horticultural Research International, has posted slides from a valuable talk on 16 Jan “Land-use in the UK:  balancing productivity with environmental impacts“. I heard about this talk via Alan Spedding and the terrific resource Rusource, giving briefings on many issues affecting agriculture and the rural environment: see the Rusource page to subscribe to the weekly newsletter and Twitter feed.

Ian’s talk brings together lots of valuable facts and figures, from the UK and globally, as well as having a nice emphasis on the genetic contribution to sustainability and production. I was just preparing lectures for our Biodiversity and Sustainability undergraduate course and will certainly base a class around this slide set.

This is not the only presentation of note at the conference: all the talks are on the Farming Futures
website at and several others are recommended strongly!

Original presentation:

Edit 4 Feb: Manuscript containing much of this material:




Link for local use with PW


Any advertisements are nothing to do with me (except I did not pay to have them removed!). DO NOT CLICK ON Adverts saying Download!

Biology of Brassica crops – origins, taxonomy, genetics & breeding

Brassica flowers

Brassica flowers

OECD has published (18 Dec 2012) a remarkably comprehensive overview of the biology of Brassica crops covering their origins, taxonomy, reproductive biology, genetics, ecology, plant breeding, and even intergeneric hybrids. Brassica vegetables are grown on 3.4 million hectares (Mha), while oilseeds cover no less than 26 Mha. This anonymous document (“Canada served as the lead country”) is designed to be a “snapshot” of current information for use during regulatory assessments of biotechnology, but the information is valuable to all researchers in the group too.

Original document:

and Local use link here.








Any advertisements are nothing to do with me (except I did not pay to have them removed!). DO NOT CLICK ON Adverts saying Download!

Peak farmland – will there ever be an increase in farmed area?


Peak Farmland

We often hear of the concept of “peak oil”, where the idea is that the maximum rate of oil production has been reached. But the concept is vague: hydrocarbons are present in many fossil and remote or undersea formations, and there are continuous changes in extraction technology and the efficiency and nature of uses, so the ‘peak’ becomes an economic concept. Similar riders apply for most other ‘peak’ concepts when applied to natural resources such as minerals: with a will to pay, its near-impossible to run out of gold with its abundance in sea water.

What about “peak farmland”? The amount of land, give-or-take a small amount changing at coasts,  is constant, but can be brought into production as farmland by ‘clearance’ of native vegetation, or taken out of use as farmland by building, conservation and other measures, not to mention salinity and erosion.

This paper from Ausubel, Wernick and Waggoner of Rockefeller University puts forward the view  that we have now reached the peak of farmland area in the world. Their data (figure 9, reproduced above) show no increase in the last 20 years, and their prediction is a decrease in farmland area from now on. The consider that the increases in crop productivity over the last centuries will continue, with both genetic and agronomic components, while neither population nor meat demand will soar.

They are “expecting that more and richer people will demand more from the land, cultivating wider fields, logging more forests, and pressing nature, comes naturally. The past half-century of disciplined and dematerializing demand and more intense and efficient land use encourage a rational hope that humanity’s
pressure will not overwhelm nature. Beginning with the examples of crops in the large and fast-developing countries of india and China as well as the United States, we examine the recent half-century. we also look back over the past 150 years when regions like europe and the united States became the maiden beneficiaries of chemical, biological, and mechanical innovations in agriculture from the industrial revolution. organizing our analysis with the ImPACT identity, we examine the elements contributing to the use of land for crop production, including population, affluence, diet, and the performance of agricultural producers.”

Many blogs also discuss this paper:

Original paper here

Local copy for local users here

Wheat yields: comparison of pre- and post- green revolution wheats in Pakistan

High and low yielding wheat

High and low yielding wheat: thin stands, thin leaves and thin ears

Comparisons of crop yields over long periods are difficult to carry out and hence rarely done. Roger Austin in the 1980s at the Plant Breeding Institute in an impressive paper of 1980 (rightly cited no less than 562 times, surely a record for J agric. Sci. Camb.) showed the increase in yield potential of UK wheats in a comparison of 12 varieties released between 1908 and the mid 1970s. Obviously things have advanced a lot in wheat cultivars, yield, and reduced input requirement, since then, although the results remain valid.

A new paper from Amahah Batool, Ijaz Rasool Nooka (currently a Visiting Research Fellow in the Molecular Cytogenetics Lab. at Leicester), M Afzal and AH Syed measures not only yields in 12 pre- and post- green revolution wheats in Pakistan, but also the performance of new hybrids. There are remarkable differences in heterosis. The paper suggests, some characters (genes) from the older varieties may not be represented in modern germplasm, and these varieties may include useful agronomic characters for low-input and rain-fed conditions.

Estimation of Heterosis, Heterobeltiosis and Potence Ratio Over Environments Among Pre and Post Green Revolution Spring wheat
in Pakistan  Amarah Batool et al.

“Abstract: Globally wheat trade has a major and impacting role in political and economic relationships between nations. Twelve pre-green revolution and post green revolution wheat genotypes viz., Sehr-06, Pasban-90, C-273, Pari-73, SA-42, Fsd-08, Chenab-70, Blue Silver, Lasani-08, Pak-81, Uqab-2000, and Pothowar-73 and their direct and reciprocal crosses were evaluated. The study concluded significant differences and highest values in heterosis, heterobeltiosis and potence ratio were found among genotypes and their cross combinations for pollen viability (Sehr-06  Blue Silver), flag leaf area (SA-42  Fsd-08), number of grains per spike (Pak-81 Lasani-08) and grain yield plant-1(Chenab-70  Fsd-08). Under changing climatic condition and limited water provision an amalgamation of pre-green revolution and post green revolution may provide a genetic diversity to break the stagnant yield barrier to ensure food security.”

In situ of retrotransposon-rich BACs in Helianthus/Sunflower to make physical map

Paper on Helianthus annuus (2n=34) BAC in situ. The BACs have a high content of retrotransposons, but Feng et al. from North Dakota use blocking DNA to prevent distributed hybridization signal, and identify BAC/BIBAC clones hybridizing specifically to each of the 17 chromosomes.

Feng, J., Liu, Z., Cai, X., & Jan, C.-C. (2013). Toward a molecular cytogenetic map for cultivated sunflower (helianthus annuus l.) by landed BAC/BIBAC clones. G3: Genes – Genomes – Genetics(1), 31-40.


Toward a Molecular Cytogenetic Map for Cultivated Sunflower (Helianthus annuus L.) by Landed BAC/BIBAC Clones

Jiuhuan Feng, Zhao Liu, Xiwen Cai and Chao-Chien Jan

USDA-Agricultural Research Service, Northern Crop Science Laboratory, Fargo, North Dakota

Conventional karyotypes and various genetic linkage maps have been established in sunflower (Helianthus annuus L., 2n = 34). However, the relationship between linkage groups and individual chromosomes of sunflower remains unknown and has considerable relevance for the sunflower research community. Recently, a set of linkage group-specific bacterial /binary bacterial artificial chromosome (BAC/BIBAC) clones was identified from two complementary BAC and BIBAC libraries constructed for cultivated sunflower cv. HA89. In the present study, we used these linkage group-specific clones (>100 kb in size) as probes to in situ hybridize to HA89 mitotic chromosomes at metaphase using the BAC- fluorescence in situ hybridization (FISH) technique. Because a characteristic of the sunflower genome is the abundance of repetitive DNA sequences, a high ratio of blocking DNA to probe DNA was applied to hybridization reactions to minimize the background noise. As a result, all sunflower chromosomes were anchored by one or two BAC/BIBAC clones with specific FISH signals. FISH analysis based on tandem repetitive sequences, such as rRNA genes, has been previously reported; however, the BAC-FISH technique developed here using restriction fragment length polymorphism (RFLP)-derived BAC/BIBAC clones as probes to apply genome-wide analysis is new for sunflower. As chromosome-specific cytogenetic markers, the selected BAC/BIBAC clones that encompass the 17 linkage groups provide a valuable tool for identifying sunflower cytogenetic stocks (such as trisomics) and tracking alien chromosomes in interspecific crosses. This work also demonstrates the potential of using a large-insert DNA library for the development of molecular cytogenetic resources.

International link to paper:

Local link to paper: HelianthusBACinsitu


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 605 other followers

Blog Stats

  • 5,308 hits News

%d bloggers like this: