Scientists develop an algorithm that can scan millions of papers, PREDICT discoveries and “uncover hidden knowledge”
03/19/2020 // Franz Walker // Views

Machine learning algorithms have been used to train artificial intelligence to do a number of amazing things, from playing chess to predicting customer preferences. Now, a group of researchers used machine learning to try to uncover hidden scientific knowledge.

Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have created a machine learning algorithm, called Word2vec, that can scan millions of scientific papers, and then use that knowledge to predict future scientific discoveries. The study has shown that an algorithm with no prior training in materials science can successfully uncover new scientific knowledge without any need for human guidance.

“Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals,” said team lead Anubhav Jain, a scientist at Berkeley Lab's Energy Storage and Distributed Resources Division. “That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far.”

Teaching a machine to read science

According to lead author Vahe Tshitoyan, a former Berkeley Lab postdoctoral fellow who now works at Google, the creation of Word2vec was motivated by the difficulty of making sense of the overwhelming amount of previously published studies.

“In every research field there's 100 years of past research literature, and every week dozens more studies come out,” said Tshitoyan. “A researcher can access only fraction of that.”


Faced with this challenge, the team decided to make a machine learning algorithm that can make use of all of this collective knowledge without needing intervention from human researchers.

As part of this, the team collected 3.3 million abstracts from papers published in more than 1,000 journals between 1922 and 2018. Using this, Word2vec took about 500,000 words from the abstracts and then turned them into a 200-dimensional vector, or an array of 200 numbers. Using this, the algorithm could then learn how each of the words were related to one another.

Following this, Word2vec was then trained on materials science texts. Here, it was able to learn the meaning of scientific terms and concepts based on the positions of words in the abstracts, and how often they occurred with other words. Word2vec even learned the relationships between elements in the periodic table, by simply projecting the vector for each chemical element onto two dimensions. (Related: McDonald's acquires machine-learning startup to develop personalized menus using A.I.)

Predicting scientific discoveries years in advance

With Word2vec trained using the abstracts, the team tested it to see if it could predict breakthroughs in the development of novel thermoelectric materials. These are materials that can efficiently convert heat into electricity. When the team looked at the top thermoelectric material candidates predicted by the algorithm, they found that all had computed power factors higher than known thermoelectric materials.

To further test Word2vec, the team had the algorithm perform experiments “in the past” – that is, the algorithm was only given abstracts up to a certain point in time, for example, the year 2000. From this, Word2vec not only accurately “predicted” the breakthroughs in thermoelectrics that had been made since then, it actually found others that have yet to be discovered.

With these results, the team is now working to release the top 50 thermoelectric materials predicted by the algorithm, so that scientists can start work developing them. In addition to this, they're releasing the word embeddings, so that others can make their own applications for other materials. Beyond this, the team is also working on a smarter, more powerful search engine based on the algorithm that should provide a more useful way for scientists to search for abstracts.

Sources include:

Take Action:
Support Natural News by linking to this article from your website.
Permalink to this article:
Embed article link:
Reprinting this article:
Non-commercial use is permitted with credit to (including a clickable link).
Please contact us for more information.
Free Email Alerts
Get independent news alerts on natural cures, food lab tests, cannabis medicine, science, robotics, drones, privacy and more.
App Store
Android App
eTrust Pro Certified

This site is part of the Natural News Network © 2022 All Rights Reserved. Privacy | Terms All content posted on this site is commentary or opinion and is protected under Free Speech. Truth Publishing International, LTD. is not responsible for content written by contributing authors. The information on this site is provided for educational and entertainment purposes only. It is not intended as a substitute for professional advice of any kind. Truth Publishing assumes no responsibility for the use or misuse of this material. Your use of this website indicates your agreement to these terms and those published here. All trademarks, registered trademarks and servicemarks mentioned on this site are the property of their respective owners.

This site uses cookies
Natural News uses cookies to improve your experience on our site. By using this site, you agree to our privacy policy.
Learn More
Get 100% real, uncensored news delivered straight to your inbox
You can unsubscribe at any time. Your email privacy is completely protected.