The Evolution of Protein Folding: Is a Crisis Brewing for Darwin?
Historically speaking, there is a distinction to bear in mind between puzzles that prove a challenge to a scientific theory and puzzles that turn into a crisis. The Michelson-Morley experiment in the late 19th century proved to be a crisis for classical physics. So did black-body radiation. The former led to Einstein’s special theory of relativity. The latter to quantum mechanics. Both involved radically new ways of visualizing space and time that could not be avoided if –in the case of Einstein– symmetry was to be reached between classical mechanics and Maxwell’s electrodynamics, and–in the case of Planck– sense was going to be made of all the observational data on radiation. On the other hand, up to the end of the 19th century, Newtonian physics had weathered many puzzles that required some refinement of the theory only.
Darwin’s theory has also faced its share of puzzles (and continues to). Before the advent of genetics in the early 20th century, for example, natural selection was looking like something far worse than a puzzle for evolution. Then population genetics grew as a field and the work of specialists such as Simpson, Dobzhansky (to name just a few) established firmer grounds for natural selection.
Still, a crisis is what many skeptics of evolution thirst for, and as often happens what you’d like to see can blind you to what is actually there (or not there). Proponents of Intelligent Design think it’s the complexity of the bacterial flagellum that cannot be explained in terms of genetic variation and natural selection.
I was struck by a comment made a while back related to proteins. It all started with Francis Beckwith’s post at What’s Wrong with the World on the incompatibility between Aquinas and Intelligent Design.
WWWTW blog is self-consciously modeled on Chesterton’s classic essay collection of the same name (and in fact I have a first edition American, 1910, soon approaching it’s 100th birthday and in very good condition). And while it is encouraging that Aquinas and Intelligent Design don’t fit–it remains odd to me that the hostility many academics of Catholic, mainline protestant and Orthodox traditions have for evolution is subtler but not fundamentally different from that of, well, fundamentalists and the more overt intelligent design proponents. Which is to say: an always negative tendency to attack scientists for what they don’t know yet. For all the adherence to Aquinas and his arguments from secondary causes, it seems many can’t resist falling into the God of the Gaps reasoning implied by the natural theology of Protestant William Paley. (Whatever happened to checking in with Cardinal Newman?)
For example, apropos of a quip by Lydia McGrew dismissing the use of computer models for evolution (“Just amazing what you can do when “seeing” computer programs “evolve” rather than dealing with actual biological entities. If that counts as “scientists have shown” I have several bridges to sell them.”), fellow What’s Wrong With the World blogger (and, I’m green with envy to say, instrument-rated private pilot) Zippy followed up:
This is an important point. The computer models that computational biologists use bear (or at least bore, a few years back when I was studying this at the graduate level, and still bear every time I do the due diligence) very little resemblance to what is actually going on in physical reality. I’ve mentioned this before, but here it is again: as far as we know random polypeptide chains of any significant length don’t fold into stable native states under physiological conditions at all, let alone fold into nontoxic stable native states, let alone fold into stable native states which perform a useful function which can provide fodder for natural selection, let alone do all that and result in wholly new kinds of proteins, cell types, tissues, organs, or species. And all-atom computer models of hundred-residue chains don’t even exist: they are well beyond the compute power available to present day researchers. Computerized protein structure predictions are based on lookup-table statistical analysis of homologes (I know, I had to do some in order to pass a bioinformatics course), not on any kind of at all even remotely workable model of what is actually taking place at the molecular level.
The victory party is still very, very, very premature; but if the neo-Darwinists don’t keep holding it, someone might get the idea that they’ve been doing nothing but blowing smoke for a century or two for reasons that don’t have much to do with a dispassionate search for the truth. And we can’t have that.
By this reasoning, evolution is apparently worse than an empty suit, prematurely being celebrated by scientists doing nothing. The assertion here seems to be that no actual progress is being made on what amounts to a major problem for evolutionary biology.
Is a crisis in the offing? As we’ll see, the answer is no. But it is a challenge, and a fascinating one that, to this layman’s eye, looks bound to lead to more fruitful discoveries.
So, let’s start with the computer models. Mark Pallen, professor of Microbial Genetics at University of Birmingham, and author of the Rough Guide to Evolution, tells me, “Computer models are obviously simpler than reality and one could not establish from first principles by computer modeling the evolutionary pathways that led to the first proteins, nor model every possible structure in sequence space.”
“But,” he adds, “this is a bit like saying you can never understand the architecture of a church without an atomic resolution model of all the materials and components that make it up. Or that because we cannot model every atom in the atmosphere, we have no understanding of the weather and cannot make useful weather forecasts. While we may not be able to predict the folded structure of a protein from its sequence, let alone of every 100 amino acid protein in protein sequence space, that does not mean we cannot perform experiments or make observations that inform our understanding of early protein evolution.”
According to Nick Matzke, a researcher at the Huelsenbeck Lab, Center for Evolutionary Genomics at U.C. Berkeley, “the processes that we think produce new genes/proteins etc. are not equivalent to random-assembly-all-at-once-from-scratch… We have duplication, modification, selection, rearrangement, etc. “
“Even the very first polypeptides were pretty certainly not assembled all-at-once-from-scratch from a pool of 20+ kinds of amino acids in even proportions, in D- and L-form, as creationists and various beknighted physicists blithely assume. Probably the first time a proto-tRNA grabbed an amino acid and made a short chain, the chain was composed of glycine and few common hydrophobic amino acids and was quite short. Cavalier-Smith (2001) suggests that the original function may have just been a hydrophobic tail for association with a membrane. All of the improbability statistics are irrelevant in this sort of scenario, chirality isn’t an issue, etc. “
This is in line with the current research, for example, of Professor Andrei N. Lupas, director of the Department of Protein Evolution at the Max-Planck-Institute for Developmental Biology in Tübingen.
Accrording to Prof. Lupas, “The problem arises from the fact that random polypeptide chains indeed essentially do not fold (I would estimate the proportion to about 1:1020 for polypeptides in the range between 70 to 120 residues). Clearly abiotic systems cannot produce the starting material for a random exploration of folding space (never mind the problem of passing on the information on anything useful you encountered) and it beggars belief that biotic systems could emerge that produce 99.99999999999999999% trash for an initially barely selectable benefit. “
But this is hardly a reason to toss out the principles of evolutionary biology. According to Prof. Lupas: “The solution obviously is to propose that an initial RNA world used peptides for other purposes, in which folding was not an issue, but that it selected for peptides that could become structured upon encountering an RNA scaffold (there is ample evidence that there is a natural affinity between peptides and nucleic acids and that random peptides have a tendency to bind into the grooves, becoming structured through the exclusion of water). The issue then becomes to explain how a set of (non-folding) peptides could yield (folding) polypeptides under natural selection.
“In my department at the MPI in Tübingen, we explore the hypothesis that folded proteins indeed arose from this preselected pool of peptides, through amplification, fusion and recombination. By being written into one chain, these peptides preselected for the ability to form secondary structures would have found that in many cases they could now exclude water between each other, without the need for an RNA scaffold. Folding would thus be an emergent property resulting from the increased length and complexity of peptides. If this is true, then we think we should be able to reconstruct this vocabulary of peptides in the same way in which ancient languages such as indo-European have been reconstructed through the comparison of modern languages.”
Two of Lupas’ recent papers are here:
On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world?
Lupas AN, Ponting CP, Russell RB.J Struct Biol. 2001 May-Jun;134(2-3):191-203.
More than the sum of their parts: on the evolution of proteins from peptides.
Söding J, Lupas AN. Bioessays. 2003 Sep;25(9):837-46.
Professor Lupas also contributed a chapter to Computational Structural Biology, published last September, which is devoted to the evolution of protein folds. Here’s a snippet worth quoting at length from the end of the chapter:
Proteins may have originated by the repetition of short peptides, a process that efficiently yields fibrous proteins such as coiled coils and β-helices.39,40 Repetitive sequences appear to have a higher chance of folding and also more favorable structural properties than nonrepetitive sequences.41,42 The problem of passing on the sequence information, however, remains unsolved. Also, domains seen today do not have fibrous elements at their core; there is a discontinuity in fold complexity between fibers and all other folded domains and fibers are structural, not catalytic elements, whereas the primary role of proteins is catalysis.
We favor a scenario for the origin of proteins by fusion and recombination from an ancestral set of peptides, which emerged in the context of RNA-dependent replication and catalysis (the “RNA world”).15 These peptides, originally short chains of abiotic origin, would have been selected as co-factors of ribozymes, broadening their catalytic spectrum and improving their stability and folding efficiency. As the abiotic pool became depleted, ribozyme-based organisms developed an evolutionary incentive to ligate peptides catalytically, and later also to establish a primitive code so as to increase the yield of useful peptides. The need for improved specificity provided the evolutionary pressure for the emergence of peptides capable of assuming secondary structure on an RNA scaffold. The assembly of longer polypeptide chains from these pre-optimized peptides led to folding as an emergent property, when peptides found that they could now exclude water between themselves (“hydrophobic collapse”) in the absence of an RNA scaffold. The dominant role of recurrent supersecondary structures in the architecture of modern folds43 may be the result of this process.
Whatever the mechanism, it appears to have ceased a long time ago, since the basic complement of proteins in living beings has not been enriched by new folds for hundreds of millions of years and has probably been essentially stable since the time of the last common ancestor. Why is that? Did nature find most islands of stability available to the 20 natural alpha-amino acids in one burst around 3.8–3.5 billion years ago? Or is it that, once a set of folded and functional proteins was in place, no new exemplars could emerge across the complexity boundary imposed by the twin constraints of structure and function, without being eliminated immediately by established competitors? The issues resemble the questions surrounding animal bodyplans. These also emerged in a comparatively short time (the “Cambrian explosion”) and only a very limited number became established. Even though new opportunities arose periodically through large-scale extinction events, none led to the emergence of new body-plans; rather, the openings were filled by survivors with the same or similar body plans as the extinct species.
From the other side of the world, Ian Musgrave, professor at the University of Adelaide in Australia writes, “as others have already said, proteins probably didn’t arise from random assembly of 100+ amino acids in one go in the first place. ” But they didn’t need to. He cites, among others, these two papers:
Keefe AD, Szostak JW. Functional proteins from a random-sequence library. Nature. 2001 Apr 5;410(6829):715-8. Link here.
J Mol Evol. 2003 Feb;56(2):162-8.Can an arbitrary sequence evolve towards acquiring a biological function? Hayashi Y, Sakata H, Makino Y, Urabe I, Yomo T. (Musgrave: “The answer is yes.”)
Keefe and Szostak are optimistic about their progress:
Our isolation of new functional proteins shows that it should be possible to obtain an unbiased view of the inherent diversity of all possible protein structures, and to determine whether biological proteins represent only a small subset of this diversity. Comparing the sequences of our newly evolved ATP-binding proteins with biological ATP-binding proteins has not revealed any significant similarity; structural data will also be required to reveal whether these proteins, especially the Zn2+ metalloprotein, are similar to those of any biological proteins.
In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 1011 that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms. However, this frequency is still low enough to emphasize the magnitude of the problem faced by those attempting de novo protein design.
According to Musgrave, “a modest fraction [of random polypeptides] (somewhere between 1 in 108 and 1 in 1012) have some sort of selectable function.”
These are just a few scientists with whom I raised the question. There are many more making the evolution of protein folding the center of their attention. Far from being a black box embarrassment to evolutionary biology, the evolution of protein folding turns out to be a challenge worthwhile to quite a few specialists.
So where does that leave the assertion of crisis at the state of protein evolution? To me it seems no different than the discredited irreducible complexity arguments of the ID movement. Because protein folding cannot be fully explained now by the principles of evolutionary biology (i.e, descent with modification by the mechanisms of genetic variation and natural selection), the thinking goes, it must therefore call into question the entire theory.
As I mentioned earlier, I understand why this kind of argument is irresistible to fundamentalist evangelicals. But it still surprises me that academics with a clear tradition of appreciation for Aquinas and secondary causes flirt with it.