Researchers have advised policy makers to redefine what constitutes truly anonymous data after a large-scale study from Imperial College, London, found that so-called anonymised datasets, once sold on to third parties such as advertising companies and data brokers, can be reverse-engineered to re-identify individuals through machine learning techniques.
The problem is, once this data is deemed to be anonymised it is no longer subject to data protection regulations. Worryingly, 98 per cent of people in the US dataset used in the study were correctly re-identified through only 15 characteristics, including age, gender and marital status.
The researchers have developed a tool – aimed, for now, at those with a UK or US postcode – that calculates how vulnerable you are to re-identification. “If your employer or your neighbour finds someone matching your date of birth, gender, and zip code in an ‘anonymous’ health dataset, that person would be you x per cent of the time,” says the app.
It's not rocket science. The study's first author, Dr Luc Rocher, explains: "While there might be a lot of people who are in their 30s, male, and living in New York City, far fewer of them were also born on the 5th of January, are driving a red sports car, and live with two kids (both girls) and one dog."
[ https://www.nature.com/articles/s41467-019-10933-3Opens in new window ]