An anonymous reader quotes a report from Motherboard: In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn’t know the definition of thermoelectric, though. It received no training in materials science. Using only word associations, the algorithm was able to provide candidates for future thermoelectric materials, some of which may be better than those we currently use.
To train the algorithm, the researchers assessed the language in 3.3 million abstracts related to material science, ending up with a vocabulary of about 500,000 words. They fed the abstracts to Word2vec, which used machine learning to analyze relationships between words. Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot. After showing its capacity to predict future materials, researchers took their work back in time, virtually. They scrapped recent data and tested the algorithm on old papers, seeing if it could predict scientific discoveries before they happened. Once again, the algorithm worked. “In one experiment, researchers analyzed only papers published before 2009 and were able to predict one of the best modern-day thermoelectric materials four years before it was discovered in 2012,” the report adds.