“Sunday Morning in the Mines” by Charles Christian Nahl (1818 – 1878)
An oil painting on canvas. Crocker Art Museum, Sacramento, California
The California gold rush of 1849 was significant event for wealth creation and technological progress. There were three phases in this revolution which determined wealth creation, and could be ranked by the difficulty of obtaining wealth.
Mining gold and the three stages
- Easy: Initially gold was found simply by looking at the surface level, being picked off by the first to come into the area
- Medium: More sophisticated techniques like panning helped discover gold from streams of water
- Hard: Capital was required to develop technologies which made it possible for the discovery of gold from more difficult places
Easy come, easy go
The sequence of wealth creation in mining gold is exactly the kind of behavior that will be experienced in light of the decreasing cost of whole genome sequencing. While there are many potential applications of mining genomic information, the companies that exist today operate in the easy category. First order analysis exhibited by 23andme and Knome represent highly naive or simple approaches to making genomic data palatable - one for a consumer audience and one for a medical or research audience. Companies like DNAnexus exist to take very simple algorithms for analysis of sequence pipelines, basic visualization and make these tools available to more researchers for their use. This is simply repackaging and consumerizing existing approaches.
The watershed moment is coming for Genomics
Genome sequencing is already undergoing a significant reduction in cost, but its watershed moment will come when we are able to migrate from expensive methods of sequencing (<$1000), which require taking samples to labs, to much lower cost sequencing that can happen remotely and do not take up a lot of space (>$100). Combining mobility in computing with mobility in sequencing will provide many compounding effects. This is very exciting to me.
Nanopores are the shovels
Fortunately, the enabling technology we are waiting for is nanopore sequencing. Two companies - Genia and Oxford Nanopore - are going to be significant providers of infrastructure to enable others to build services more easily on top of genomic information. Continuing the mining analogy these are these companies are making shovels and customer engagement with these technologies will drive their continued evolution over time. Therefore these are key technologies to understand.
In order to derive a genome sequence you have to read the exact composition of DNA and its component bases - A, C, T and G. Reading can be done directly (literally seeing the DNA; EM approaches like Halcyon Molecular and ZS Genetics are representative of direct sequencing), or more commonly indirectly, by determining the indirect presence of the sequence of bases (A, C, T and G) through a secondary chemical reaction.
Nanopore sequencing exists in the secondary reaction category where threading a single stranded DNA sequence through an artificial protein channel - the nanopore - exposes the bases. Depending on the base change the ionic flow and therefore changes the current passing through the membrane. The resulting current for each base allows us to indirectly derive the genomic sequence.
Nanopore sequencers will be smaller because nanopores themselves are small, and as semiconductor coupled nanopores can be combined with cheap mobile computing, and wireless connectivity allow for reading of samples, as well as generation of data in a small package. This is a desired outcome, but portability and cheap sequencing will drive nanopore sequencing to be the breakout technology of genomic mining.
Shovels need fertile ground for exploration
So with cheap sequencing where can we pan for gold? Panning is a suitable analogy for the next phase. Sequencing generates data, but the origin of the data is where value is created. One of the key lessons from starting Quid is that to build a valuable company one needs to have unique data and a unique methodology for panning for gold. Proprietary data and proprietary algorithms for transforming raw sequence data into valuable molecules for identification, and their validation is key to finding gold.
Proprietary data comes from the population one chooses to sample. Complexity in deriving value from genomics is proportional to the complexity of the organism. Therefore the first phase of panning will start with simpler organisms like viruses and bacteria before moving up the chain to ourselves. There are many valuable populations to sample genomic information from. One example, is in finding new drugs, by sampling the natural diversity of molecules in soil. Some of our most significant drugs like pennicilin are natural molecules. Panning for drugs is already seeing applications of natural discovery - Warpdrive Bio is a company leading the way, with a trailblazing partnership with Sanofi. Sampling the genomic diversity of the ocean, or the many bacterial populations within us - the Microbiome- are fertile opportunities for mining and discovery.
How fine be thy sieve?
However, just like panning, filtering the data intelligently, and applying techniques to transform a messy sample into an understandable result is key. Metagenomics is the application of statistical learning techniques to determine individual sequences from a sample that contains the fragments of diverse organisms. Bioinformatic techniques to identify known molecules, and network analysis to identify the population of related molecules can be applied. I have some understanding of data analysis techniques from network analysis from my work at Quid, and so the application of these approaches in mining of genomic data is very interesting. Its also not just enough to determine important molecules computationally, but high throughput molecular biology, and some synthetic biology work can help characterize important molecules and truly seperate the wheat from the chaff.
More broadly, mining and panning efforts will require the interdisciplinary collaboration of software engineers, computer scientists, biochemists and more applied practitioners such as medical professionals and those in the Pharma/Biotech industries. The potential reward for an ambitious group like this is discovering new drugs, treatments, and data which may increase our lifespan, fight disease, and in turn generate great wealth.
I’m thinking a lot about the issues presented here, have thoughts or want to chat? email me on sumon [dot] sadhu @ gmail [dot] com
May the Goldrush commence.
p.s. I intend to go into the topic mentioned here with more technical depth in future posts. This is merely an introductory narrative.