New Research

About this Section

As Knowledge Lab and the Metaknowledge Research Network crank out results, those results will populate this space. Our goal is to present citations (and links) to published research, as well as preprints, articles and prospectives, responses, as well as presentations of our work and results formatted for a general audience.

New Findings:

Tradition and Innovation in Scientists' Research Strategies

What factors affect a scientist’s choice of research problem? Qualitative research in the history, philosophy, and sociology of science suggests that this choice is shaped by an “essential tension” between the professional demand for productivity and a conflicting drive toward risky innovation. We examine this tension empirically in the context of biomedical chemistry. We use complex networks to represent the evolving state of  scientific knowledge, as expressed in publications. We then define research strategies relative to these networks. Scientists can introduce novel chemicals or chemical relationships—or delve deeper into known ones. They can consolidate existing knowledge clusters, or bridge distant ones. Analyzing such choices in aggregate, we find that the distribution of strategies remains remarkably stable, even as chemical knowledge grows dramatically. High-risk strategies, which explore new chemical relationships, are less prevalent in the literature, reflecting a growing focus on established knowledge at the expense of new opportunities. Research following a risky strategy is more likely to be ignored but also more likely to achieve high impact and recognition. While the outcome of a risky strategy has a higher expected reward than the outcome of a conservative strategy, the additional reward is insufficient to compensate for the additional risk. By studying the winners of 137 different prizes in biomedicine and chemistry, we show that the occasional “gamble” for extraordinary impact is the most plausible explanation for observed levels of risk-taking. Our empirical demonstration and unpacking of the “essential tension” suggests policy interventions that may foster more innovative research.

For More: Foster, Jacob G., Andrey Rzhetsky, and James A. Evans. “Tradition and Innovation in Scientists’ Research Strategies.”  American Sociological Review 0003122415601618 (2015). doi:10.1177/0003122415601618.


Weaving the fabric of science: Dynamic network models of science's unfolding structure

Science is a complex system. Building on Latour's actor network theory, we model published science as a dynamic hypergraph and explore how this fabric provides a substrate for future scientific discovery. Using millions of abstracts from MEDLINE, we show that the network distance between biomedical things (i.e., people, methods, diseases, chemicals) is surprisingly small. We then show how science moves from questions answered in one year to problems investigated in the next through a weighted random walk model. Our analysis reveals intriguing modal dispositions in the way biomedical science evolves: methods play a bridging role and things of one type connect through things of another. This has the methodological implication that adding more node types to network models of science and other creative domains will likely lead to a superlinear increase in prediction and understanding.
 For More: Shi, F., Foster, J.G., Evans, J.A., 2015. Weaving the fabric of science: Dynamic network models of science’s unfolding structure. Social Networks 43, 73–85. doi:10.1016/j.socnet.2015.02.006

The Modular Community Structure of Linguistic Predication Networks

This paper examines the structure of linguistic predications in English text. Identified by the copular “is-a” form, predications assert category membership (hypernymy) or equivalence (synonymy) between two words. Because predication expresses ontological structure, we hypothesize that networks of predications will form modular groups. To measure this, we introduce a semantically motivated measure of predication strength to weight relevant predications observed in text. Results show that predications do indeed form modular structures without any weighting (Q _ 0.6) and that using predication strength increases this modularity (Q _ 0.9) without discarding low-frequency items. This high level of modularity supports the  networkbased analysis and the use of predication strength as a way to extract dense semantic clusters. Additionally, words’ centrality within communities exhibits slight correlation with hypernym depths in WordNet, underscoring the ontological organization of predication.

Extracting Clusters of Specialist Terms from Unstructured Text

Automatically identifying related specialist terms is a difficult and important task required to understand the structure of less prominent portions of the lexicon. Terms are often defining features of a particular domain. We develop a corpus-based method of extracting coherent clusters of satellite terminology – terms on the edge of the lexicon – using co-occurrences networks from unstructured text. Clusters are identified by extracting communities in the co-occurrence graph, after which we largest is discarded and rank words in the remaining groups by centrality. The method is computationally tractable on large corpora, requires no document structure and minimal normalization. Results suggest that the method does indeed extract coherent groups of satellite terms in corpora with varying content, style and structure. Second, the results show that language consists of a densely connected core (previously found in dictionary structure) and also has systematic, semantically coherent structure on the fringe of the observed vocabulary.


Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Automated systems that extract and integrate information from the research literature have become common in biomedicine. As the same meaning can be expressed in many distinct but synonymous ways, access to comprehensive thesauri may enable such systems to maximize their performance. Here, we establish the importance of synonymy for a specific text-mining task (named-entity normalization), and we suggest that current thesauri may be woefully inadequate in their documentation of this linguistic phenomenon. To test this claim, we develop a model for estimating the amount of missing synonymy. We apply our model to both biomedical terminologies and general-English thesauri, predicting massive amounts of missing synonymy for both lexicons. Furthermore, we verify some of our predictions for the latter domain through “crowd-sourcing.” Overall, our work highlights the dramatic incompleteness of current biomedical thesauri, and to mitigate this issue, we propose the creation of “living” terminologies, which would automatically harvest undocumented synonymy and help smart machines enrich biomedicine.

For More: Blair, D.R., Wang, K., Nestorov, S., Evans, J.A., Rzhetsky, A., 2014. Quantifying the Impact and Extent of Undocumented Biomedical Synonymy. PLoS Comput Biol 10 (September, 2014)

Read the Full Text

Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication


Divergent interests, expertise, and language form cultural barriers to communication. No formalism has been available to characterize these \cultural holes." Here, we use information theory to measure cultural holes, and demonstrate our formalism in the context of scientific communication using papers from JSTOR . We extract scientific fields from the structure of citation flows, and infer field-specific cultures by cataloguing phrase frequencies in full text and measuring the relative efficiency of between-field communication. We then combine citation and cultural information in a novel topographic map of science, mapping citations to geographic distance and cultural holes to topography. By analyzing the full citation network, we find that communicative efficiency decays with citation distance in a field-specific way. These decay rates reveal hidden patterns of cohesion and fragmentation. For example, the ecological sciences are balkanized by jargon, while the social sciences are relatively integrated. Our results highlight the importance of enriching structural analyses with cultural data.

For More: Vilhena, Daril A., Jacob G. Foster, Martin Rosvall, Jevin D. West, James Evans, and Carl T. Bergstrom. 2014. “Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication.” Sociological Science 1: 221-238.

Read the Full Text


Attention to Local Health Burden and the Global Disparity of Health Research

Most studies on global health inequality consider unequal health care and socio-economic conditions but neglect inequality in the production of health knowledge relevant to addressing disease burden. We demonstrate this inequality and identify likely causes. Using disability-adjusted life years (DALYs) for 111 prominent medical conditions, assessed globally and nationally by the World Health Organization, we linked DALYs with MEDLINE articles for each condition to assess the influence of DALY-based global disease burden, compared to the global market for treatment, on the production of relevant MEDLINE articls, systematic reviews, clinical trials and research using animal models vs. humans. We then explored how DALYs, wealth, and the production of research within countries correlate with this global pattern. We show that global DALYs for each condition had a small, significant negative relationship with the production of each type of MEDLINE articles for that condition. Local processes of health research appear to be behind this. Clinical trials and animal studies but not systematic reviews produced within countries were strongly guided by local DALYs. More and less developed countries had very different disease profiles and rich countries publish much more than poor countries. Accordingly, conditions common to developed countries garnered more clinical research than those common to less developed countries. Many of the health needs in less developed countries do not attract attention among developed country researchers who produce the vast majority of global health knowledge—including clinical trials—in response to their own local needs. This raises concern about the amount of knowledge relevant to poor populations deficient in their own research infrastructure. We recommend measures to address this critical dimension of global health inequality.

For More: Evans, James A., Jae-Mahn Shim, and John P. A. Ioannidis. “Attention to Local Health Burden and the Global Disparity of Health Research.” PLoS ONE 9, no. 4 (April 1, 2014)

Read the Full Text


Enlarge this poster

Representative Publications:

Future Science

futurescience.jpg“An emerging area of interest in research on the “science of science” is the prediction of future impact. Impact prediction is consequential for the evaluation of research grants, the dispensing of scholarly awards, and the determination of faculty salaries, among other decisions. As predictions improve, they will play a larger role in directing choices about what areas public and private capital will choose to research, develop, and produce. But how can we predict the future?”

For More: Evans, James A. Future Science Science 342, no. 6154 (October 4, 2013): 44–45.

Read the Full Text



metaknowledge1.gif“The growth of electronic publication and informatics archives makes it possible to harvest vast quantities of knowledge about knowledge, or “metaknowledge.” We review the expanding scope of metaknowledge research, which uncovers regularities in scientific claims and infers the beliefs, preferences, research tools, and strategies behind those regularities. Metaknowledge research also investigates the effect of knowledge context on content. Teams and collaboration networks, institutional prestige, and new technologies all shape the substance and direction of research. We argue that as metaknowledge grows in breadth and quality, it will enable researchers to reshape science—to identify areas in need of reexamination, reweight former certainties, and point out new paths that cut across revealed assumptions, heuristics, and disciplinary boundaries.

For More:” Evans, James A., and Jacob G. Foster. Metaknowledge Science 331, no. 6018 (February 11, 2011): 721–725.

Read the Full Text


Machine Science

Machine Science“Scientists today cannot hope to manually track all of the published science relevant to their work. A cancer biologist, for instance, can find more than 2 million relevant papers in the PubMed archive, more than 200 million Web pages with a Google search, and databases holding results from experiments that produce millions of gigabytes of data.”

For More: Evans, James, and Andrey Rzhetsky. Machine Science. Science 329, no. 5990 (July 23, 2010): 399–400.

Read the Full Text