By Eric Collins, Expedition Co-Principal Investigator, University of Alaska Fairbanks
The Arctic Marine Microbes project is exploring the biodiversity of microorganisms in extreme environments in the Arctic, but how exactly do we do that?
For hundreds of years, microbiologists relied on growing cells in the laboratory to investigate their diversity and function, but that paradigm started to shift dramatically in the past few decades. Over that time, scientists made an observation that came to be known as the “Great Plate Count Anomaly,” which is that microbial biodiversity observed by growing cells on petri plates is less than one percent as much as when using methods that bypass this ‘culturing’ step.
As explained previously, the key to bypassing this step is to look directly at the genetic material of the microbes. Here we will explain the cutting-edge ‘metagenomics’ methods used by scientists to explore this hidden microbial realm.
Every organism has a genome made out of DNA, which can be thought of as a recipe book for making a cell. Each genome contains a different set of recipes, some of which are common and some are rare. Each recipe makes a different protein that the cell needs. Closely related microbes share similar recipe books, while distantly related microbes may only share the most important recipes.
If printed in 12-point font, each recipe book would be about 1,000 pages long and have three recipes per page. A cup of seawater contains perhaps 30,000,000 microbes, so the Library of Genomes in the cup would contain as many books as the Library of Congress!
The first thing we need to do if we want to read these recipe books is to learn the language they are written in. Compared to English, which has 26 letters, DNA is simpler in that it only has 4 letters (A, C, G, T). In English, the average word is about five letters long; in DNA, words are called ‘codons’ and all of the words are three letters long. There are over 200,000 words in the English language but in DNA there are only 4 * 4 * 4 = 64 possible words.
Each of these ‘codon words’ is translated into a special language by a cellular machine called the ribosome. This translated language has 21 unique words, which are called amino acids. These ‘amino acid words’ are strung together to make a protein. It is amazing what biology can do with its 21 words – since each protein is made up of about 300 amino acid words, it means that there are more possible protein recipes than there are particles in the observable universe!
Once we know how to read the language, it’s time to go to the library. When we collect microbes from seawater or other environments, we screen the cells out of the water with a very small filter. Once the cells are attached to the filter, we need to break them open to get the DNA out. Some cells are fragile and break open easily, while others are more robust and need to be broken open by chemical or physical means.
We add small sand-like beads and lytic enzymes to the cells and shake them at high speed in a tube. The enzymes break down the cell wall and the beads smash against each other and pop cells between them. Once the DNA is released from the cells we clean it up and remove other cellular components like proteins and cell membranes. After the DNA is collected and rinsed it is time to find out what it says.
The process of reading the recipe book of an organism is called ‘DNA sequencing’. There are many technologies that can do this and they each have their benefits and constraints. The method we use is called ‘sequencing by synthesis’ and the machine used is called an Illumina MiSeq.
The first step in preparing the DNA for sequencing is to randomly break it into small pieces. This is like taking all of the books in the Library of Congress and putting them through a shredder, so that each shred has about one line of text. Then we attach a linker that connects each shredded piece of DNA onto a glass slide. Many millions of pieces of DNA can be attached to each slide, and the slide is then placed under a powerful microscope.
In DNA, each of the four letters (A, C, G, T) has a partner: A and T are partners and C and G are partners. When we do sequencing by synthesis we go through many cycles of asking “which partner do you have?” We do this by adding each of the letters to the slide. When a piece of attached DNA has the partner to a letter, then the partners will match up together and release light. The color of the light depends on which letter was matched. The microscope detects this color for each of the millions of DNA pieces. The machine automatically goes through several hundred cycles of adding DNA letters and looking for the light when they partner. By tracking the color changes, we can then read the DNA sequence of each line of the shredded recipe books.
Once we have read the sequence of each shred of DNA, we still have a problem – how do we know which line goes with which recipe book? It is a very difficult problem to reconstruct the Library of Congress line by line, so we use supercomputers to help speed up the process. One thing that helps us reconstruct the Library of Genomes is that there are many books that are similar or identical (i.e., cells of the same species of bacteria).
Since the shredding is a random process, many of these lines will overlap, so by comparing each line to each other line in the computer, we can make a very good guess as to whether they came from the same recipe. When we are able to link many of these lines together we call it a ‘contig’. Each contig may contain many pages from a book or only partial pages. In fact, it is rare that we are able to reconstruct an entire recipe book by this method, but so far it is the best one that we have.
Even without the full book we can learn a lot about the microbial community based on the pages that we are able to read. The pages might tell us that certain bacteria have recipes for breaking down a toxic compound, or that other bacteria have recipes for producing vitamins that another organism needs.
One thing we always keep in mind when we are reading these books is that, like any recipe book, not every recipe will get made and some recipes will be made more often than others. Some recipes require other recipes in order to come out right. Scientists now have methods (called ‘transcriptomics’ and ‘proteomics’) to find out which recipes are actually being used by microbes!
We won’t go into those here, but recent advances in metagenomics, transcriptomics, and proteomics have provided valuable insight into the functions of microbial communities, and future technological advances will make it even easier to read the many books of life in the Arctic Ocean.