Healthcare companies must scale their anonymization algorithms to protect the patients they serve
In the 90’s, a PhD candidate from MIT took an anonymized data set – health data where all the names, SSNs, and health insurance IDs have been removed – and re-identified most of the records using the Massachusetts voter registration database. Even without names or IDs, categories such as ZIP codes, sex, and DOB that remained in the database were used to identify more than 70% of the individuals, including the governor.1
The question then becomes: where is the balance between safeguarding information while preserving the utility of that data? If we remove or hash those aforementioned ZIP codes, sex, and DOBs – will the database still be useful?
The Data Security Landscape
It’s easy to throw your hands up and say “the safest computer is a broken computer”, but new techniques in protecting data have been becoming the industry standard.
These include:
- Data Masking: Focuses on preserving data relationships across different databases, but alters values. For example, replacing a character with a symbol.
- Pseudonymization: Replaces private identifiers with pseudonyms or system-generated IDs
- Generalization: Removes specific identifiers to smooth out the data. For example, removes the house number in an address but keep the road name
- Data swapping: Exactly what it sounds like, reshuffling your data
- Data perturbation: Rounding
- Synthetic data: Uses the statistics from your real data to create a dummy dataset. Populates your dataset with fake numbers that have standard deviation, medians, or linear regression.
The efficacy of each of these methods depends on the size of the database and the business requirements. Stratos recommends that your data is at least protected by pseudonymization as that is the simplest to implement and doesn’t result in loss of content.
These are the basic methods that every business should consider. An even more elaborate system is called Differential Privacy. This is an algorithm that looks at factors such as the uniqueness of each parameter in a dataset and how many possible queries the data permits and assigns a “privacy value” to the data. If that privacy value is below 1, the data is relatively secure.
Human Error
Algorithmic anonymization is only part of this equation.
We are very much desensitized to data breaches. Equifax. Wells Fargo. Facebook. 300 million compromised accounts here, 400 million there. Most people have learned over the years to associate these invasions of privacy with zero negative personal consequences. Intellectually, we know that we should react, but who has the mental energy to worry about something that happens every few months?
People deal with this the same way they deal with every other choice overload: invest their concern in the big glaring privacy violations, and everything below that threshold gets sorted into the worry-later pile.
Part of the reason data anonymization is not a top-of-list topic in every organization is this fatigue. The allure of data insights and the hassle of working with anonymized data is combined with years of personal experience where data breaches are seen as both ubiquitous and unstoppable.
Training and compliance is an essential step in security. Stratos recommends continuous refreshers on data privacy best practices and annual revaluation of the anonymization techniques as the company grows or pivots. Companies should regularly evaluate the specific requirements for granting employees access to patient health information (PHI) to ensure that it is only accessible to employees that require it. For those employees that require access as part of their role, standard rules and procedures for working with PHI and how to safeguard sensitive data should be formally trained and acknowledged at least annually.
A data breach in healthcare industries carries a much wider impact than the financial industry, because in addition to financial records, hospitals and pharma companies keep intellectual property and sensitive patient data.
Establishing effective data protection procedures and understanding the new tools as they become available are essential to your business. A culture of data security awareness can bring peace of mind in our ever changing world.
Article by Veronica Shlyaptseva