Data Linkage: From Theory to Practice
Venue: Highfield Campus, University of Southampton, UK
Presenter: Dr Nathalie Shlomo
Dates of Course: Wednesday 10th - Friday 12th April 2013
This course has already run. Please check the course listings for a future course.
Summary of Course:
The course will introduce basic concepts and methods of record linkage and will discuss methodological and statistical aspects of this new emerging area. The course will cover deterministic and probabilistic approaches to record linkage including pre-matching processes, determining matching weights, decision theory and types of errors, evaluation of the quality of linkage procedures, and issues on the statistical properties of linked datasets. By the end of the course, participants should have an understanding of record linkage techniques and be able to implement and evaluate record linkage procedures. The course does not assume any prior knowledge of record linkage and there will be a session devoted to the revision of basic concepts in probability theory necessary to understand probabilistic record linkage. The course will have a strong practical emphasis and will include tutorials and a computer workshop to enable course participants to put the taught methods into practice. The software that will be used is SAS although no familiarity with SAS prior to the course is required.
- To develop an understanding of the theory of data linkage techniques
- To enable participants to implement a probabilistic data linkage procedure
- To provide tools for evaluating and assessing the quality of the data
This course will include the following topics:
- Introduction and types of record linkage methods
- Sources for record linkage
- Examples of record linkage applications
- Pre-matching processes (data cleaning, standardizing and parsing of fields)
- Revision in probability and odds, Bayes Theorem and Hypothesis Testing
- Deterministic matching
- Probabilistic matching
- Field agreement weights and frequency based weights
- String Comparators
- Blocking variables
- Evaluation of record linkage
- Introduction to EM algorithm
- Introduction to the analysis of linked datasets
- Computing lab in SAS - applying record linkage to two datasets
The course is aimed at researchers who need to gain an understanding of record linkage techniques. The course emphasizes putting theory into practice for those who need to carry out record linkage in their own work. Participants may be academic researchers in the social and health sciences or may work in government, survey agencies, official statistics, for charities or the private sector.
The course does not assume any prior knowledge of record linkage and a special session will be devoted to the revision of probability theory necessary to understanding probabilistic record linkage. No familiarity with the software SAS will be assumed.
Participants will receive written course notes, tutorials and computing lab material.
- Belin, T.R. and Rubin, D. B. (1995) A Method for Calibrating False-Match Rates in Record Linkage. Journal of the American Statistical Association, 90, 694-707.
- Fellegi, I. P. and Sunter, A. B. (1969) A Theory for Record Linkage, Journal of the American Statistical Association, 64, 1183-1210.
- Gill, L. (2001) Methods for Automatic Record Matching and Linkage and their use in National Statistics, The National Statistics Methodology Series, ONS (availablehttp://www.statistics.gov.uk/downloads/theme_other/GSSMethodology_No_25_v2.pdf )
- Herzog, T. N., Scheuren, F. J. and Winkler, W. E. (2007) Data Quality and Record Linkage Techniques. New York: Springer. ISBN 978-0-387-69502-0
- Mason, C.A. amd Shihfen, T. (2008) Data Linkage Using Probabilistic Decision Rules: A Primer, Birth Defects Research (Part A): Clinical and Molecular Teratology 82, 812-821
- Winglee, M., Valliant, R. and Scheuren, F. (2005) A Case Study in Record Linkage. Survey Methodology, Vol. 31, Number 1, 3
- Winkler, W. E. (1995) Matching and Record Linkage, in B.G. Cox et al. (ed) Business Survey Methods, New York: J. Wiley, 355-384 http://www.fcsm.gov/working-papers/wwinkler.pdf
- Record Linkage References William.E.Winker@census.gov (2008Mar01) http://www.hcp.med.harvard.edu/statistics/survey-soft/docs/WinklerReclinkRef.pdf
Natalie Shlomo is a lecturer for the Social Statistics Division and the Coordinator and Academic Tutor for the MSc in Official Statistics Programme at the University of Southampton. She has extensive knowledge of survey methods including data processing: record linkage, edit and imputation processes and statistical disclosure control.
£30 per day for UK registered students. £60 per day for staff from UK academic institutions (including research centres), ESRC funded researchers and UK registered charitable organisations. £220 per day for all other participants. The course fee includes course materials, lunches and morning and afternoon refreshments. Travel and accommodation are to be arranged and paid for by the participant.
Location and Accommodation:
The course will be held at the Southampton Statistical Sciences Research Institute, Building 39, University of Southampton, Southampton, SO17 1BJ. Participants are left to make their own accommodation arrangements. Further information on local accommodation and course location is available here.
The course will start with registration and coffee at 9.30 with formal teaching starting at 10.00 am on the first day (and on all subsequent days). The lectures will go to 17:00. On the last day, formal teaching will end at about 3.30pm. Afterwards there will be an opportunity for participants to ask questions about the course and to discuss with the instructor how to link their own datasets (you can bring your own data to the course if you wish).