Confidentiality and anonymisation
The storage and use of research data (including research participants’ personal data) must comply with the University’s Code of Practice on Data Protection and the University’s Information Protection Policy. If it is possible to identify a living individual from the data, either directly or indirectly, the data is classified as personal data.
When planning a research project it’s important to think about what personal data needs to be collected and why. Don’t collect personal data if you don’t need it. For example, ask participants to indicate their age range, or provide their year of birth rather than their date of birth if you don’t need to know that amount of detail. On screening questionnaires (eg PAR-Qs), if possible group the exclusion criteria so that a single yes/ no answer can be given, to avoid participants having to specify which exclusion criteria apply. This reduces the sensitivity of the data and is less likely to cause embarrassment for research participants.
To anonymise data means to take away any means of identifying individuals from the data, whereas to keep something confidential means not sharing it (or sharing it within limited, pre-agreed channels). For example, interview transcripts could be confidential but the participants could be anonymised in the published research findings.
Pseudonymisation takes the most identifying characteristics of the data and replaces them with one or more artificial identifiers, or pseudonyms, for example by replacing a name with a unique number. This makes research participants less identifiable but means it still possible, with a cipher, to identify individuals.
Think about what you are going to do with the data and/ or research results. Avoid promising confidentiality on the information sheet or consent form if the data (or quotations) will be shared anonymously. Make sure that any possibility of deductive disclosure of identities (that is, when someone is nominally anonymised but recognised because of other information such as a rare job title) has been identified and addressed. Participants should be made aware of any limits to anonymisation.
The process of anonymisation involves removing aspects of data from which a living person can be identified:
Direct identifiers include: name, initials, contact details (including partial postcodes), IP addresses, photos, videos, audio recordings, unique identifying numbers (eg car registration plates, NHS numbers), dates relating to an individual.
Indirect identifiers include: gender, location, socioeconomic data, ethnicity, unusual details (eg rare disease, behaviour), small denominators, very small numerators (may present a risk if present in combination with others in the list).
It might not be possible to identify individual participants’ data yourself once all the identifiers have been removed. If the data is going to be fully anonymised make sure participants are aware of any limits to withdrawal beforehand. If withdrawal is only possible up until a certain point a clear deadline for withdrawal should be provided in the participant information.
- Further guidance on anonymity, confidentiality and pseudonymisation.
- The University’s Information Protection Policy defines and specifies the controls needed for confidential and highly confidential material.
- MRC guidance on identifiability, anonymisation and pseudonymisation – new April 2019
- UKDA guidance
- JISC guidance
- SDDU research ethics training courses
- Protocol on Data protection, anonymisation and storage and sharing of research data
- Research data guidelines
- UK Data Archive guidance on anonymisation
- Research Ethics Guidebook – guidance on confidentiality