Data Linkage: An introduction to statistical and methodological issues
Venue: Highfield Campus, University of Southampton, UK
Presenter: Dr Nathalie Shlomo
Dates of Course: Wednesday 20th - Friday 22nd January 2010
This course has already run. Please check the course listings for a future course.
Summary of Course:
The course will introduce course participants to the basic ideas and methods of data linkage and will discuss methodological and statistical aspects of this new emerging area. The course will cover deterministic and probabilistic approaches to record linkage including pre-matching processes, determining field weights, types of errors in hypothesis testing, evaluation of the quality of linkage procedures, and issues on the statistical properties of linked datasets. The course does not assume any prior knowledge of record linkage, however, course participants should have a knowledge of basic inductive statistics and probabilities as well as statistical inference. A session devoted to revision of basic concepts in probability will be included in the syllabus. The course will have a strong practical emphasis and will include a computer workshop to enable course participants to put the learned methods into practice. The software that will be used is SAS. No familiarity with SAS prior to the course is required.
Course Objectives:
- To develop an understanding of the theory of record linkage techniques
- To enable participants to implement a probabilistic record linkage procedure
- To provide tools for evaluating and assessing the quality of the record linkage
Course Content:
- Introduction and types of record linkage
- Pre-matching processes (data cleaning, standardizing and parsing of fields)
- Revision in probability and odds, Bayes Theorem and Hypothesis Testing
- Deterministic matching
- Probabilistic matching
- Field agreement weights and frequency based weights
- String comparators
- Blocking variables
- Evaluation of data linkage
- Introduction to the E-M algorithm
- Putting theory into practice - computing lab in SAS
Target Audience:
The course is aimed at researchers who need to gain an understanding of record linkage techniques and analyzing linked datasets. In addition, the course emphasizes putting theory into practice for those who need to carry out data linkage in their own work. Participants may be academic researchers in the social and health sciences or may work in government, survey agencies, official statistics, for charities or the private sector.
Pre-requisite:The course does not assume any prior knowledge of record linkage. However, course participants should have a basic knowledge of concepts in inductive statistics and probabilities and statistical inference. No familiarity with the software SAS will be assumed.
Course Materials:Participants will receive written course notes.
The Instructor:
Natalie Shlomo is a lecturer for the Social Statistics Division and the MSc coordinator for the MSc in Official Statistics of the University of Southampton. She has extensive knowledge of survey methods including data processing such as record linkage and editing and imputation processes.
Course Fee:£30 per day for UK-registered students. £60 per day for staff from UK academic institutions (including research centres), ESRC funded researchers and registered charity organizations. £220 per day for all other participants. The course fee includes course materials, lunches and morning and afternoon tea, but not accommodation and travel which is to be arranged by the participant.
Location and Accommodation:
The course will be held at the Southampton Statistical Sciences Research Institute, Building 39, University of Southampton, Southampton, SO17 1BJ. Participants are left to book their own accommodation according to individual needs. Further information on both venue and accommodation can be obtained from here.
Duration:
The course will start with registration and coffee at 9.30 with formal teaching starting at 10.00 am on the first day (and on all subsequent days). The lectures will go to 17:00. On the last day, formal teaching will end at about 3.30pm. Afterwards there will be an opportunity for participants to ask questions about the course and to discuss with the instructor how to link their own datasets (you can bring your own data to the course if you wish).
Preparatory Reading:
Fellegi, I. P. and Sunter, A. B. (1969) A Theory for Record Linkage, Journal of the American Statistical Association, 64, 1183-1210.Gill, L. (2001) Methods for Automatic Record Matching and Linkage and their use in National Statistics, The National Statistics Methodology Series, ONS (available at http://www.statistics.gov.uk/downloads/theme_other/GSSMethodology_No_25_v2.pdf )
Herzog, T. N., Scheuren, F. J. and Winkler, W. E. (2007) Data Quality and Record Linkage Techniques. New York: Springer. ISBN 978-0-387-69502-0
Mason, C.A. amd Shihfen, T. (2008) Data Linkage Using Probabilistic Decision Rules: A Primer, Birth Defects Research (Part A): Clinical and Molecular Teratology 82, 812-821
Winkler, W. E. (1995) Matching and Record Linkage, in B.G. Cox et al. (ed) Business Survey Methods, New York: J. Wiley, 355-384 http://www.fcsm.gov/working-papers/wwinkler.pdf
Record Linkage References William.E.Winker@census.gov (2008Mar01) http://www.hcp.med.harvard.edu/statistics/survey-soft/docs/WinklerReclinkRef.pdf
Deadlines and Refunds:
Course places are limited and early completion of this form is recommended. Payment must be made when submitting the registration form. Refunds for cancellation are as follows. Full refund for cancellation one calendar month before the course, no refunds can be made for cancellations after this date. Please not that in case of cancellations an administration charge of £30 will apply.