Monday, June 26, 2017

Debating alternative splicing (Part III)

Proponents of massive alternative splicing argue that most human genes produce many different protein isoforms. According to these scientists, this means that humans can make about 100,000 different proteins from only ~20,000 protein-coding genes. They tend to believe humans are considerably more complex than other animals even though we have about the same number of genes. They think alternative splicing accounts for this complexity [see The Deflated Ego Problem].

Opponents (I am one) argue that most splice variants are due to splicing errors and most of those predicted protein isoforms don't exist. (We also argue that the differences between humans and other animals can be adequately explained by differential regulation of 20,000 protein-coding genes.) The controversy can only be resolved when proponents of massive alternative splicing provide evidence to support their claim that there are 100,000 functional proteins.

Some scientists are attempting to test the hypothesis by looking for the predicted proteins. One of the ways to do that is to look for them using mass spectroscopy. Recently (February, 2017) Tress et al. reviewed and reanalyzed eight large scale experiments and databases. Here's the abstract of their paper ...
Alternative splicing is commonly believed to be a major source of cellular protein diversity. However, although many thousands of alternatively spliced transcripts are routinely detected in RNA-seq studies, reliable large-scale mass spectrometry-based proteomics analyses identify only a small fraction of annotated alternative isoforms. The clearest finding from proteomics experiments is that most human genes have a single main protein isoform, while those alternative isoforms that are identified tend to be the most biologically plausible: those with the most cross-species conservation and those that do not compromise functional domains. Indeed, most alternative exons do not seem to be under selective pressure, suggesting that a large majority of predicted alternative transcripts may not even be translated into proteins.
There are lots of problems with these mass spec experiments. For example, they hardly ever detect all of the peptides of most genes even when the proteins are present in high concentrations. In addition to these well-known false negatives, there are some interesting false positives in the data. The authors are aware of these limitations and they are described and discussed in the paper.

Nevertheless, it is remarkable that tens of thousands of predicted protein variants are not detected in these experiments. The authors conclude ...
Alternative splicing is well documented at the transcript level, and microarray and RNA-seq experiments routinely detect evidence for many thousands of splice variants. However, large-scale proteomics experiments identify few alternative isoforms. The gap between the numbers of alternative variants detected in large-scale transcriptomics experiments and proteomics analyses is real and is difficult to explain away as a purely technical phenomenon. While alternative splicing clearly does contribute to the cellular proteome, the proteomics evidence indicates that it is not as widespread a phenomenon as suggested by transcript data. In particular, the popular view that alternative splicing can somehow compensate for the perceived lack of complexity in the human proteome is manifestly wrong. [my emphasis LAM]
Note: I strongly object to using "alternative splicing" as a synonym for "detection of large numbers of splice variants." I think the term "alternative splicing" should be restricted to genuine examples of real alternative slicing that generate different functional proteins. We should be very careful to make this distinction very clear in our writing.

The authors review other data on splice variants noting that they are not conserved and they are usually present at low concentrations. Coupling that data with the lack of detection of protein variants they say ...
The results from large-scale proteomics experiments are in line with evidence from cross-species conservation, human population variation studies, and investigations into the relative effect of gene expression and alternative splicing. Gene expression levels, not alternative splicing, seem to be the key to tissue specificity. While a small number of alternative isoforms are conserved across species, have strong tissue dependence, and are translated in detectable quantities, most have variable tissue specificities and appear to be evolving neutrally. This suggests that most annotated alternative variants are unlikely to have a functional cellular role as proteins. [my emphasis, LAM]
My colleague at the University of Toronto, Ben Blencowe, is a strong supporter of alternative splicing and its role in creating multiple isoforms of most proteins. He wrote a letter to Trends in Biochemical Sciences in which he challenges their results and conclusions. I'll discuss that letter in my next post. The authors of the paper respond to that letter.

If you have questions about the Tress et al. paper this is a good place to ask them since one at least one of the authors (Frederico Abascal) reads Sandwalk. I've repeatedly asked proponents of alternative splicing to address the issues I raise here but they have consistently declined to engage in debate. I don't know why they are so reluctant to defend their views.

Finally, allow me to make an important point about massive alternative spicing. This is the view that most human protein-coding genes (~90%) are alternatively spliced to produce multiple functional protein isoforms. This view is just speculation. It is not supported by solid evidence.

The fact are:
  1. Almost all intron-containing genes produce slice variants or various sorts. Most databases show a strong correlation between the size of a gene and the number of variants that have been detected. Most genes have ten or more different variants in the various databases.
  2. Splicing is an error-prone process. The error rate of splicing ranges from 0.01% to 1%. Modern techniques are quite capable of detecting the products of splicing errors and entering their sequences into the databases.
  3. There are some excellent examples of true alternative splicing where the various protein isoforms have been detected and their functions elucidated. There are 35 examples listed in Tress et al. (2017). There may be several hundred examples if you include those with weaker evidence. There are about 20,000 protein-coding genes in the human genome.
  4. Most variant splice sites are not conserved. The same gene in related species can produce a very different pattern of splice variants.
  5. After decades of searching, the vast majority of predicted protein isoforms have not been detected.
There are some interesting questions that have not been addressed.
  • Most of the predicted protein isoforms postulated by massive alternative splicing proponents make no sense. For example, why would there be multiple isoforms of the standard metabolic enzymes? For proteins involved in large complexes (e.g. RNA polymerase) why would there be multiple isoforms that completely alter the structures of the polypeptides?
  • Why is it necessary to "explain" human complexity by postulating massive alternative splicing? What's wrong with the standard explanation from developmental biology?
  • Is the evolution of massive alternative splicing in a single species, like humans, compatible with our understanding of evolution? How about the presumed expansion in clades such as mammals? Is that more compatible? Is natural selection really so powerful?
  • Why do proponents of massive alternative splicing consistently ignore the possibility that splice variants could be just splicing errors? Why do reviewers of their papers allow them to ignore the main scientific criticism of their views? This is not how science is supposed to work.

Tress, M.L., Abascal, F., and Valencia, A. (2017) Alternative splicing may not be the key to proteome complexity. Trends Biochem. Sci. 42:98-110. [doi: 10.1016/j.tibs.2016.08.008]


  1. Alternative splicing is frequent during early embryonic development in mouse
    Timothée RevilEmail author, Daniel GaffneyEmail author, Christel Dias, Jacek Majewski and Loydie A Jerome-Majewska†
    †Contributed equally
    BMC Genomics201011:399
    DOI: 10.1186/1471-2164-11-399

    "Alternative splicing is known to increase the complexity of mammalian transcriptomes since nearly all mammalian genes express multiple pre-mRNA isoforms. However, our knowledge of the extent and function of alternative splicing in early embryonic development is based mainly on a few isolated examples. High throughput technologies now allow us to study genome-wide alternative splicing during mouse development.

    A genome-wide analysis of alternative isoform expression in embryonic day 8.5, 9.5 and 11.5 mouse embryos and placenta was carried out using a splicing-sensitive exon microarray. We show that alternative splicing and isoform expression is frequent across developmental stages and tissues, and is comparable in frequency to the variation in whole-transcript expression. The genes that are alternatively spliced across our samples are disproportionately involved in important developmental processes. Finally, we find that a number of RNA binding proteins, including putative splicing factors, are differentially expressed and spliced across our samples suggesting that such proteins may be involved in regulating tissue and temporal variation in isoform expression. Using an example of a well characterized splicing factor, Fox2, we demonstrate that changes in Fox2 expression levels can be used to predict changes in inclusion levels of alternative exons that are flanked by Fox2 binding sites.

    We propose that alternative splicing is an important developmental regulatory mechanism. We further propose that gene expression should routinely be monitored at both the whole transcript and the isoform level in developmental studies"

    1. Bill, I don't think you've been reading. Differential concentration of splice variants in different cells is not good evidence of function, because it can be a simple effect of differential levels of transcription of the genes and, conceivably, of regulatory sequences involved in true alternative splicing.

      I'm not sure I like the use of "expression" to mean "transcription", though perhaps that's standard. Don't know. But note that this study doesn't, at least from what you quote, assay actual proteins, just RNAs.

      Now of course one could argue that the differently spliced RNAs are functional as RNAs, not as proteins. That would be a fallback position that would be even harder to falsify. Would you like to go there?

    2. A tiny problem with the conclusion is that we should expect accidental variants corresponding to genes whose products are involved in development, because those genes are expressed during development.

      The most convincing piece is that they compare the proportion of variants to that of whole RNA expression. But not convincing enough. Reading the paper would be better, but no time now.

    3. John, "Now of course one could argue that the differently spliced RNAs are functional as RNAs, not as proteins. That would be a fallback position that would be even harder to falsify. Would you like to go there?"

      My point in posting this is that embryo development needs to be explored while trying to solve the junk or splicing puzzle. Other papers show unique AS protein variants expressed uniquely during development.

    4. So all you're saying is that different proteins are expressed at different times? But you're citing a paper that doesn't look at proteins.

  2. I think this paper shows AS proteins who's variant is used in development.

    Alternative Splicing Produces Nanog Protein Variants with Different Capacities for Self-renewal and Pluripotency in Embryonic Stem Cells*
    Satyabrata Das‡, Snehalata Jena‡ and Dana N. Levasseur‡§,1 October 3, 2011
    doi: 10.1074/jbc.M111.290189

    "In the present study, we analyzed the genomic neighborhood surrounding the Nanog gene locus for evidence of an expanded Nanog gene structure. We identified novel sequences from ES cells that extend the 5′ region of the known Nanog gene. Two additional new exons and 6 different subexons are differentially processed from alternative splicing. We find that this post-transcriptional regulation results in two new Nanog protein variants and we explore the function of these variants in ES cell self-renewal and pluripotency. Our studies reveal evidence that the first 25 amino acids of the NTD of Nanog are essential for both ES cell pluripotency and self-renewal. Finally, we show that a single serine residue in the NTD of Nanog (Ser-2) is essential for the maintenance of the undifferentiated ES cell state."

    1. It does appear to. Of course everyone here agrees that there are are some alternatively spliced proteins. The question is how many of them there are.

      By the way, when citing a paper, you need the journal, volume, and page numbers.

    2. It's a simple aspect of logic that seems to escape creationists when they try to discuss molecular biology. Demonstrating a single black swan does not mean that all swans are black, or even that black swans are common.

    3. "It does appear to. Of course everyone here agrees that there are are some alternatively spliced proteins. The question is how many of them there are."

      I don't know how many there are but I am suggesting the embryo development is a place to look because cells are very active at this stage especially during cell division. A measurement of low expression levels "noise" is not relevant until you have measured all stages of the cell from initial growth stages to maturity.

    4. I did a gross estimation based on the proportion of (mutually exclusively spliced) homologous exons of ancient origin that were detected with proteomics and the proportion of alternative splicing events detected with proteomics that correspond to ancient homologous exons. The estimate yielded between 1000 and 2000 alternative protein isoforms (i.e. not more than a few thousands). But this is a very gross estimation, not reliable!