By Dr. Warren Wood, Research Geophysicist, U.S. Naval Research Laboratory
June 24, 2019
Today we are mobilizing – moving equipment onto the ship and making sure everything works as it is supposed to. Things are much easier to fix at the dock than when we are at sea. While that is going on, I wanted to take this opportunity to explain a little better about how machine learning will be used on this cruise.
Machine learning refers to a recently developed type of computer algorithm (program or “App”) that has grown out of, and is similar to, other kinds of artificial intelligence. Machine learning (ML) finds patterns and correlations in data that may be difficult, or just time consuming, for a human to do. It does not tell us why the correlations exist, only how good the correlations are. ML does not “do” science, but it helps us perform the observation portion of the scientific method by picking out patterns. Generating a testable hypothesis still requires a trained scientist.
A simple example of ML is the identification of a particular animal, let’s say a cat, in a large number of photographs. The ML algorithm must be trained with many pictures that include cats at all available angles, poses, environments, and lighting conditions. The algorithm must also be shown pictures with no cats (“not cat”). If it has only seen cats, it will think everything is a cat. If it has been shown only cats in grass, it cannot distinguish between the two, and will likely identify a picture of just grass as “cat.” ML algorithms can only identify what they have been trained to see.
On this cruise, we will be examining the microbes associated with shipwrecks in the Gulf of Mexico. We will be using machine learning to determine how well various seafloor quantities, like seafloor slope, distance from the wreck, bottom water current speed, etc., correlate with the various microbial communities around shipwrecks. In our case, we are interested in specific genetic traits of a group of microbes – the genes associated with consuming the metal or wood of wrecks. These genomes are analogous to the cats in the example above. We need to find enough examples of the “cats” (genetic markers) and “not cats” in enough different environments that we can correlate the environment to the presence of the genetic markers.
Unlike the example with cats, the genetic markers that we use to train the ML algorithm must be carefully sampled (by the remotely operated vehicle) and put through a series of sophisticated biochemical processing steps to distinguish presence from absence of genetic markers. The environment where the samples were acquired must be carefully documented. The challenge is to find which aspects of the environment correlate with which genetic markers – if any! Nature is not guaranteed to be easily understandable, but we are using state-of-the-art marine technology, biochemistry, genetics, and computer science to find out what we can!