What is Data Linkage?
Data linkage is a method of bringing information from different sources together about the same person or entity to create a new, richer dataset. The linkage of information from disparate information sources enables the construction of chronological sequences of events and when used at the macro level provide valuable information for policy and research into the health and wellbeing of the population.
Data linkage is done by assigning an identifying number to each person on a dataset and storing a set of links to all records for the person. The TDLU is responsible for creating and maintaining the links between the main state wide health data collections and other approved data sources in Tasmania. In bringing records together, the TDLU uses strict privacy preserving policies, protocols and procedures to ensure the security of the data and confidentiality of the individuals the records relate to. The information about the individual is not brought together in one place. It stays in the separate data collections and the security and means of access to the information in each data source remain unchanged.
The Separation principle
The key feature of the data-linkage model used by the TDLU is one of ensuring the separation of personal identifying information from service or clinical data. This approach is in support of National Health Medical Research Committee protocols that define linked datasets as non-identifiable.
Using this 'Separation Principle' the TDLU operates under strict protocols which include:
- Identifying data is provided to the TDLU for linkage only;
- Such data is kept on a standalone computing server with no Internet or Intranet connectivity;
- Access to the room housing the computer is via security card, that is strictly controlled;
- Data stored on the server is encrypted;
- The TDLU holds no clinical data whatsoever; and
- Researchers have no way of accessing the personal identifying data held by TDLU.
Who is involved in data linkage?
The data linkage process involves three main stakeholders:
- Data Custodians – effectively the 'owners' of data. Data custodians work within an organisation or agency (such as government departments) and are responsible for the collection, use and dissemination of data. Data custodians may manage administrative or research datasets and collect and store personal information (such as name, address, date of birth) as well as information about the person (eg. health diagnosis or treatment details).
- Researchers - the people who use the anonymised linked data for the purpose of analysis and research. Research projects undergo an extensive application process and must be approved by a relevant Human Research Ethics Committee (HREC) as well as relevant data custodians.
- Data Linkage Units - the organisations who link datasets together and create Linkage ID's, which allow data from different sources and organisations to be linked together.
A network of Data Linkage Units exist as part of the Population Health Research Network (PHRN) with each State and Territory represented. There are a further three national Integrating Authorities that can perform data linkage within and between Commonwealth and State/Territory data collections. The three accredited integrating authorities in Australia are the:
- Australian Bureau of Statistics (ABS)
- Australian Institute of Health and Welfare (AIHW)
- Australian Institute of Family Studies
How is linked data used?
Research using linked data is very reliable and efficient as it uses data from the whole population not from small samples of the population. The linkages between administrative and research or clinical datasets provides an evidence base for policy makers and researchers to better understand population health and wellbeing and implement and evaluate service delivery and programs.
Research projects using linked data makes use of administrative, survey and research/clinical data that already exist. Utilising such data minimises the burden on organisations and individuals to provide additional information and it is a cost effective solution for researchers.
Master Linkage Map (MLM) - The MLM groups together records for individuals in a population. Each individual within the Map has their own unique 'key'.
Master Linkage Key (MLK) - Refers to an individual's unique ID, otherwise knoen as a 'key'.
Project Person Identifier (PPID) - A project-specific, unique pseudo identifier that is supplied to researchers that refers to an individual with minimal risk of re-identification.
What is the Master Linkage Map (MLM)?
At the centre of the TDLU's system is a Master Linkage Map (MLM), which groups together records for individuals from the Tasmanian population. This 'map' enables the extraction of de-identified linked files representative of multiple data sources. By adding an anonymous person identifier, the map can be used for a range of research and planning purposes.
The MLM is a simple structure; it contains a list of individual record ID's, which each point to a specific record in one of the participating datasets. A unique Master Linkage Key (MLK) identifier is associated with every record and all records with the same MLK are considered to belong to the same individual.
Importantly, the MLM does not contain any clinical or service information about individuals. The TDLU only ever receives basic demographic information for the purposes of linking only.
What does linked data look like?
Linked data is supplied to researchers in a way that ensures an individual cannot be identified. Personal information such as name and address are removed and replaced by a Project Person Identifier (PPID). For each dataset, the Data Custodian provides requested clinical or service data against each of the PPIDs listed in the dataset. For example:
Year of Birth
Length of Stay
How do I access linked data?
Access to linked data is subject to a comprehensive application process together with relevant human research ethics approvals. The TDLU is currently taking applications for linked data. Examples of projects underway in Tasmania include:
- Analysis of factors that contribute to Hospital Standardised Mortality Ratio rates in Tasmanian hospitals
- The burden and cost of injury attributable to health care use and mortality in Australia
- Perinatal outcomes and child development (risk and protective factors)
- Factors that impede early access to defibrillation following out of hospital cardiac arrest
- Population level chlamydia testing and positivity rates in Tasmania
- Community presentations of anaphylaxis in Tasmania: Occurrence, management and treatment outcomes