12/13/2013 @ 12:35PM
Human DNA Is Not A Document, It's An App
Yesterday, scientists at the University of Washington announced what they characterized as an important breakthrough in our understanding of the nature of DNA. Led by genome scientist Dr. John Stamatoyannopolous, the team discovered evidence of a ?second code? written into DNA. This code controls what are called transcription factors (TF) which regulate the flow (or transcription) of genetic information from DNA to messenger RNA that manage the synthesis of proteins described in the DNA.
Transcription factors are nothing new, they have been an object of study for more than 2o years. What they are claiming is new here, according to the paper published in the journal Science yesterday, is that ?~15% of human codons are dual-use codons (?duons?) that simultaneously specify both amino acids and TF recognition sites.? Codons are he nucleotide triplets (the ?N? in DNA) that specify which amino acid to add next in the process of synthesizing a protein. Of particular significance is that the experiments were carried out on cell lines from the human exome, the 1% the genome that remains inside mature RNA and whose mutations are thought to harbor 85% of disease-causing mutations.
As my colleague Emily Willingham makes clear (see link above), this may not be all that it is cracked up to be. The salient fact is that ?the genome contains more of these dual-use DNA sequences than previously thought.? Let?s accept the fact that this is not a discovery that schools Francis Crick and James Watson, but does it change the way people will think about DNA? Changing the way people think is the coin of the realm in technology because, even more than science, it trades on the new. With anything truly new, we need analogies as prosthetics before a new paradigm is accepted. When Crick and Watson announced their discoveries about the structure and function of DNA in 1953, it took years before the general public had any idea what the double helix really meant.
Half a century later we think about code more as computer code, and the publicity around the U. Washington research may be a good time to bring popular notions of how DNA works up to date. Willingham also explains that it is a commonplace that ?the DNA sequence both contains code for proteins and serves a regulatory purpose.? Rather than being a ?second code,? as the University?s article states, it is really ?a different (but already recognized) use of the existing code, now identified as occurring at a greater frequency in areas that use the same code for proteins.?
Accepting all that, the idea of a code within a code is a powerful cultural meme that once applied to DNA may not go away so fast. And perhaps should not. The fact that this discovery may be overstated doesn?t mean that it doesn?t capture something interesting about the way the world works. A lot of great abstract art in the early 20th century was the result of misunderstanding contemporary physics! With that in mind, I have prepared seven metaphors to help us understand the implications of this concept. I?m thinking of the classic elevator pitch where you say that the new thing is like one or more things we already know, but with a crucial difference.
DNA As An App
DNA Contains Puns
On of the most startling things about the dual nature of the duons is that it makes us wonder what else could be hiding within those double helixes? The duons are like words or phrases within the ?text? of our DNA that mean two different things, depending on context. And yet, as in human language, not all words have this double meaning. The Science paper suggests that these duons are ?highly conserved? through evolution, which means that they are the Darwinian keepers. But as with puns and other figures of speech, the duons power also contains the danger of miscommunication since mutations within them are highly likely to lead to disease.
DNA As Zappos
Another concept of DNA has been that it is a factory for making proteins. But in the present context DNA appears to be more like Zappos or Amazon?s distribution hubs or the UPS Sort. DNA contains the instructions for making all of the proteins that a body needs, yes, but it also choreographs the elaborate logistical dance that delivers the right proteins to the right location at just the right time.
DNA As Playing Battleship
The dual-functioning nature of some DNA code puts us in a curious position when trying to determine, for instance, genetic factors of disease (think of the troubles of 23andMe.) In the present scenario, it is as if we have been playing a game of Battleship looking at the known position of our pieces (the genes), but not of the positions of the duons that contain the code of the TFs. And it turns out that the correlation between genes associated with a given disease and co-located TFs will be likely spots for aiming the big guns at.
We have just gotten through the human genome and perhaps we need to start again! Hidden within what scientists have already decoded is at least one more layer of code. It is interesting that at the time that Wilson and Crick came out with their discovery, another DNA researcher, Rosalind Franklin, argued that other structures could satisfy the x-ray data for DNA and wondered if their?s was ?the solution or a solution?? More than 50 years later it seems that perhaps they were all part right.
DNA As Language
Intriguing for those with a familiarity with quantitative linguistics and current work on natural language processing is how the instance of a dual-functioning code in DNA begins to make genetics seem more like a language with all the messiness that entails and less like a pure instruction set. If certain sequences have an ambiguous identity and shift modes between description and process, there is room for many errors and chaotic recursions to arise, just like in human language.
DNA As Scaling Mechanism
Finally, DNA is a model of natural economy. There are a limited number of sequences that can be generated from the base amino acids and a limited number of transcription factors that can be deployed in combinations to yield all of the complexity of a living breathing human being. Nature, I think, issimplicity scaled. And what the inclusion of the TFs within the DNA code looks like to me is that code contains both the description of the proteins it can create and the methods to scale those proteins up from a zygote all the way up to an adult and through an entire life cycle. These two processes, specifying and scaling, integrated together into the same nucleotide sequences is a conceptually elegant way of tying together DNA?s dual nature. I wonder how that would work with software?