Sandwalk: How much of the human genome is devoted to regulation?

Friday, August 25, 2017

How much of the human genome is devoted to regulation?

All available evidence suggests that about 90% of our genome is junk DNA. Many scientists are reluctant to accept this evidence—some of them are even unaware of the evidence [Five Things You Should Know if You Want to Participate in the Junk DNA Debate]. Many opponents of junk DNA suffer from what I call The Deflated Ego Problem. They are reluctant to concede that humans have about the same number of genes as all other mammals and only a few more than insects.

One of the common rationalizations is to speculate that while humans may have "only" 25,000 genes they are regulated and controlled in a much more sophisticated manner than the genes in other species. It's this extra level of control that makes humans special. Such speculations have been around for almost fifty years but they have gained in popularity since publication of the human genome sequence.

In some cases, the extra level of regulation is thought to be due to abundant regulatory RNAs. This means there must be tens of thousand of extra genes expressing these regulatory RNAs. John Mattick is the most vocal proponent of this idea and he won an award from the Human Genome Organization for "proving" that his speculation is correct! [John Mattick Wins Chen Award for Distinguished Academic Achievement in Human Genetic and Genomic Research]. Knowledgeable scientists know that Mattick is probably wrong. They believe that most of those transcripts are junk RNAs produced by accidental transcription at very low levels from non-conserved sequences.

I agree with those scientists but for the sake of completeness here's what John Mattick believes about regulation.

Discoveries over the past decade portend a paradigm shift in molecular biology. Evidence suggests that RNA is not only functional as a messenger between DNA and protein but also involved in the regulation of genome organization and gene expression, which is increasingly elaborate in complex organisms. Regulatory RNA seems to operate at many levels; in particular, it plays an important part in the epigenetic processes that control differentiation and development. These discoveries suggest a central role for RNA in human evolution and ontogeny. Here, we review the emergence of the previously unsuspected world of regulatory RNA from a historical perspective.

... The emerging evidence suggests that there are more genes encoding regulatory RNAs than those encoding proteins in the human genome, and that the amount and type of gene regulation in complex organisms have been substantially misunderstood for most of the past 50 years. (Morris and Mattick, 2014)

The evidence does not support the claim that there are more than 20,000 genes for regulatory RNAs. It's more consistent with the idea that most transcripts are non-functional.

There's another speculation related to regulation. This one was promoted by ENCODE in their original 2007 preliminary study and later on in the now-famous 2012 papers. The ENCODE researchers identified thousand of putative regulatory sites in the genome and concluded ...

... even using the most conservative estimates, the fraction of bases likely to be involved in direct gene regulation, even though incomplete, is significantly higher than that ascribed to protein-coding exons (1.2%), raising the possibility that more information in the human genome may be important for gene regulation than for biochemical function.

They go on to speculate that 8.5% of the genome may be involved in regulation. Think about that for a minute. If we assume that each site covers 100 bp. then the ENCODE researchers are speculating that there might be more than 2 million regulatory sites in the human genome! That's about 100 regulatory sites for every gene!

This is absurd. There must be something wrong with the data.

It's not difficult to see the problem. The assays used by ENCODE are designed to detect transcription factor binding sites, places where histones have been modified, and sites that are sensitive to DNase I. These are all indicators of functional regulatory sites but they are also likely to be associated with non-functional sites. For example, transcription factors will bind to thousands of sites in the genome that have nothing to do with regulation [Are most transcription factor binding sites functional?].

It's very likely that spurious transcription factor binding will lead to histone modification and DNase I sensitivity due to the loosening of chromatin. What this means is that these assays don't actually detect regulatory sites or enhancers as ENCODE claims. Instead, they detect putative regulatory sites that have to be confirmed by additional experiments.

The scientific community is gradually becoming more and more skeptical of these over-interpreted genomic experiments.

The latest genomics paper on regulatory sires has just been posted on bioRχiv (Benton et al., 2017). This is a pre-publication archive site. The paper has not been peer-reviewed and accepted by a scientific journal but it's still making a splash on twitter and the rest of the internet.

Here's the abstract ...

Non-coding gene regulatory loci are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. However, in practice, most studies consider enhancer candidates identified by a single method alone. Here we assess the robustness of conclusions based on such a paradigm by comparing enhancer sets identified by different strategies. Because the field currently lacks a comprehensive gold standard, our goal was not to identify the best identification strategy, but rather to quantify the consistency of enhancer sets identified by ten representative identification strategies and to assess the robustness of conclusions based on one approach alone. We found significant dissimilarity between enhancer sets in terms of genomic characteristics, evolutionary conservation, and association with functional loci. This substantial disagreement between enhancer sets within the same biological context is sufficient to influence downstream biological interpretations, and to lead to disparate scientific conclusions about enhancer biology and disease mechanisms. Specifically, we find that different enhancer sets in the same context vary significantly in their overlap with GWAS SNPs and eQTL, and that the majority of GWAS SNPs and eQTL overlap enhancers identified by only a single identification strategy. Furthermore, we find limited evidence that enhancer candidates identified by multiple strategies are more likely to have regulatory function than enhancer candidates identified by a single method. The difficulty of consistently identifying and categorizing enhancers presents a major challenge to mapping the genetic architecture of complex disease, and to interpreting variants found in patient genomes. To facilitate evaluation of the effects of different annotation approaches on studies' conclusions, we developed a database of enhancer annotations in common biological contexts, creDB, which is designed to integrate into bioinformatics workflows. Our results highlight the inherent complexity of enhancer biology and argue that current approaches have yet to adequately account for enhancer diversity.

The authors looked at several ENCODE databases identifying sites of histone modification and DNase I sensitivity as well as sites that are transcribed. They specifically looked at databases predicting functional enhancers based on these data. What they found was very little correlation between the various databases and predictions of functionality. When they looked at independent assays using the same cell lines they found considerable variation and a surprising lack of correlation.

While this lack of correlation does not prove that the sites are non-functional, it does indicate that you shouldn't just assume that these sites identify real functional enhancers (regulatory sites). In other words, skepticism should be the appropriate stance.

But that's NOT what the authors conclude. Instead, they assume, without evidence, that every assay identifies real enhancers and what the data shows is that there's an incredible diversity of functional enhancers.

... we believe that ignoring enhancer diversity impedes research progress and replication, since, "what we talk about when we talk about enhancers" include diverse sequence elements across an incompletely understood spectrum, all of which are important for proper gene expression. [my emphasis - LAM]

I find it astonishing that the authors don't even discuss the possibility that they may be looking at spurious sites that have nothing to do with biologically functional regulation. Scientists can find all kinds of ways of rationalizing the data when they are convinced they are observing function (confirmation bias). In this case, the data tells them that many of the sites do not have all of the characteristics of actual regulatory sites. The obvious conclusion, in my opinion, is that the sites are non-functional, just as we suspect from our knowledge of basic biochemistry.

True believers, on the other hand, arrive at a different conclusion. They think this data shows increased complexity and mysterious functional roles that are "incompletely understood."

I hope reviewers of this paper will force the authors to consider spurious binding and non-functional sites. I hope they will force the authors to use "putative enhancers" throughout their paper instead of just "enhancers."

Benton, M.L., Talipineni, S.C., Kostka, D., and Capra, J.A. (2017) Genome-wide Enhancer Maps Differ Significantly in Genomic Distribution, Evolution, and Function. bioRxiv. [doi: 10.1101/176610]

Morris, K.V., and Mattick, J.S. (2014) The rise of regulatory RNA. Nature Reviews Genetics, 15:423-437. [doi: 10.1038/nrg3722]

6 comments :

S Johnson said...: If mRNAs are so fundamental to epigenetic and developmental processes, shouldn't the array of mRNAs vary in different tissues? Neurons would have a different mix of mRNAs than cardiac muscles cells. And, if the numbers and varieties do not vary in this way, isn't that prima facie evidence these are random, non-functional accidental transcriptions?; Saturday, August 26, 2017 9:11:00 AM
Jmac said...: Meet 'Dark DNA' - The Hidden Genes That May Change How We Think About Evolution

Is Dark DNA hiding in the so-called junk DNA?

https://www.sciencealert.com/introducing-dark-dna-the-phenomenon-that-could-change-how-we-think-about-evolution

https://en.wikipedia.org/wiki/Biological_dark_matter; Saturday, August 26, 2017 11:25:00 PM
Larry Moran said...: You seem to enjoy tilting at windmills. Why?; Sunday, August 27, 2017 7:47:00 AM
Jmac said...: You seem to enjoy tilting at windmills. Why?

I think you got me confused with the ENCODE people and the like...

I'm just trying to get to the truth just as science should be...; Monday, August 28, 2017 5:52:00 PM
unknowing said...: Diferrent cell types utilize different sets of transcription factors to execute their gene expression programs. For every transcription factor there exists a set of functional and a set of spurious binding sites to which that factor can bind to facilitate transcription. Because different sets of transcription factors will bind different sets of spurious as well as functional sites, the set of spurious transcripts is, like the set of functional transcripts, expected to vary with cell type.; Wednesday, August 30, 2017 5:38:00 PM
Unknown said...: If one considers a 3D nuclear chromatin structure made by many genomic/epigenomic interactions that is unique to each cell type under specific circumstances, then maybe several million sites have to be required.; Tuesday, January 16, 2018 10:48:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Friday, August 25, 2017

How much of the human genome is devoted to regulation?

6 comments :