Computationally Scalable Statistical Methods for High-Dimensional Record Linkage

In the many databases, it is difficult to find a true entity and distinguish two similar entities (a Kevin Smith who does networks in the mathematical sense and a Kevin Smith who does networks in a systems sense). Record linkage is the merging multiple databases that lack unique identifiers. It is crucial for science and industry and also a difficult statistical problem because databases contain many errors. Steorts and her team propose to develop new techniques for record linkage that will scale to large, high-dimensional databases. Record linkage can help clarify connections between authors and concepts in scientific and technological databases. This can provide insight into how collaborations form and how new theories and technologies come about.