Automating Science

Data science algorithms and natural language processing techniques have been used to predict scientific discoveries and technological innovations, such as discovering materials with targeted electrochemical properties. However, these artificial intelligence methods ignore the distribution of experts, researchers and inventors as the human workforce of science and technology and by doing that they overlook an enormous amount of data regarding the future direction of discoveries and innovations. In this project, we aim to boost the predictive power of content-based models by incorporating the distribution of scientists and innovators such that a candidate of discovery will be marked as likely to occur in reality only if there is enough population of researchers studying it. We showed that simply considering the authorship network between scientists increases the precision of discovery prediction by around 100% in material science, and by more than 300% in drug repurposing for treatment of COVID-19.