by Matthew Cobb
There’s a great XKCD up today (click to embiggen, I’ve had to shrink it to not bump into the ads):
I initially tw**ted this with the comment “Truth. Biology is impossible”, because the cartoon emphasises that the information DNA contains is way more complicated than most non-biologists imagine. It’s a classic mistake of physical scientists (especially mathematicians and physicists) to think that biology obeys the same kind of rigorous lawfulness of those subjects – when they study biology they soon realise that living things are far more complicated than anything in physics or maths. As Martin Rees, the Astronomer Royal, put it: an insect is more complex than a star.
But thinking about it, I think although Randall (the author) has his heart in the right place, he’s still suffering from some the classic physical scientist’s assumptions (he used to be a NASA roboticist). Leaving aside the issue of the conditionality of gene expression (that may be what is meant by ‘feedback and external processing’ in the first panel), the female character explains that ‘DNA is the result of the most aggressive optimisation process in the universe’. It is the comparison of the 3.8 billion years of ‘optimisation’ of DNA with the few years of Google optimisation that makes the character in the hat conclude that ‘biology is impossible’.
This isn’t right. DNA is not subject to ‘the most aggressive optimisation process in the universe’. Our genes are not perfectly adapted and beautifully designed. They are a horrible, historical mess. That is partly what distinguishes biology from physics and maths – it is the outcome of historical processes – evolution and natural selection* – which leave their past traces in the genome.
For reasons we don’t understand, many eukaryotic genes (that is, genes in organisms with a nucleus – so all multicellular organisms and some single-celled forms, too) are sometimes split up, interspersed by apparently meaningless sequences, called ‘introns’. Although the average intron is only 40 bases long, one of the introns in the human dystrophin gene is more than 300,000 bases long! In some rare cases, the intron of one gene can even contain a completely separate, protein-encoding gene.
This isn’t the result of ‘optimisation’: it’s due to the fact that, as François Jacob put it, evolution does not design, it tinkers. It fiddles around with stuff to hand, and as long as it works, that’s all that matters.
We know that only 5% of the human genome encodes proteins (when Francis Crick was working on the meaning of the genetic code in the 1950s, he assumed that’s all that a gene would ever do). We now know that another 5-10% is regulatory DNA, which produces RNA that regulates the activity of other genes. As to the remaining 85% – around 2.7 billion base pairs – it appears mainly to be ‘junk’, which has no apparent function – if it were deleted, it would not affect the fitness of the organism at all.
There’s been a lot of argument about this, in particular since the ENCODE project suggested that virtually every bit of our DNA seemed to produce some kind of chemical reaction in a cell, which they argued meant that it was functional. But when scientists synthesised genuinely random bits of DNA, they found that they, too, could produce a reaction. If biochemical activity is produced by much of our DNA, it is indistinguishable from random noise.
I wrote about this in my book Life’s Greatest Secret:
Different species can have substantial differences in the size of their genomes, which do not seem to be related to anything in their ecology or degree of apparent physiological complexity. (…) This problem is called the ‘C-value paradox’ or ‘C-value enigma’ – ‘C’ is the amount of DNA in a genome. Some of these differences may be due to a well-known phenomenon: chunks of genomes can be duplicated during evolution, particularly in plants, which can double their genome size in one generation when chro- mosome duplication goes slightly awry. Because of factors such as duplication, the variation in genomic size that we see between species resists any overall functional explanation. This is highlighted by what is known jocularly as the onion test: the onion genome contains around 16 billion base pairs, or five times that of a human.
Another example – viruses can insert themselves into our DNA, using our cells to reproduce themselves. Sometimes they end up getting stuck, and are copied over and over. So, for example, the remnants of these invasive viral sequences make up an astonishing 45 per cent of the human genome, with one element, known as Alu, leaving genetic traces that make up around 10 per cent of your DNA.
That isn’t optimisation – it’s a millenia-long history of infection!
In some cases, these viral remnants can actually be repurposed by natural selection – tinkering with a vengeance – and such viral sequences are now thought to be at the origin of one of our most vital organs – the placenta.
On a final note, in some cases, within this amazing noise, there are also astonishing examples of complexity which do indeed appear to be the result of optimisation – and they would boggle the mind of anyone, not just a cocky computer scientist in a hat. In Drosophila there is a gene called Dscam, which is involved in neuronal development and has four clusters of exons (bits of the gene that are expressed – hence exon – in contrast to the apparently inert introns).
Each of these exons can be read by the cell in twelve, forty-eight, thirty-three or two alternative ways. As a result, the single stretch of DNA that we call Dscam can encode 38,016 different proteins. (For the moment, this is the record number of alternative proteins produced by a single gene. I suspect there are many even more extreme examples.)
In other words, DNA is even more complicated than Randall imagines – it is historical, messy, undesigned. And when occasionally it is optimised, the degree of complexity is mind-boggling. Biology is not quite impossible, it is just incredibly difficult!
* These are not the same thing! Evolution is a change in the frequency of a particular allele, or form, or a gene. Many alleles – different DNA sequences – produce no change in any character and are therefore selectively neutral. They can change their frequency without natural selection being involved.
Similarly, natural selection simply means the differential survival of different forms of organism in a species. If the differences in form have no genetic basis then natural selection will not lead to a change in allele frequencies, and therefore evolution. For example, if there was natural selection against pink flamingos, which get their colour from their environment, not their genes, then this would not lead to the evolution of a new form in the population (unless there is a genetic character for tending to go and eat only food that enables them to turn pink).
Darwin’s genius was to realise that evolution by natural selection was a way for adaptations to appear. To get this to occur you need variability in a population, for that variability to have some genetic basis and for it to lead to differential survival. With that combination of three factors, and sufficient time, you end up with the amazing variety of life we have on the planet.