Hannie
Hannie

Reputation: 427

Transform data in Azure Pipeline to make it anonymous

In my new job at a community hall in the Netherlands, we work with databases that contain privacy-sensitive data (e.g. citizen service numbers). They also recently started working with Azure, which i'm getting familiar with as we speak. So this might be a beginners question but I hope someone can lead me in the right direction: Is there a way, to retrieve data through a direct connection with a database and make it 'anonymous' for example by hashing or using a key-file of some sorts somewhere in the pipeline? I know that the pipelines are .JSON files and that it's possible to do some transformations. I'm curious about the possibilities for doing this in Azure!

** EDIT **

To be more clear: I want to write a piece of code preferably in the pipeline, that does something like this:

citizen service number person 1
102541220
#generate key/hash somewhere in pipeline of loading in data in azure
anonymous citizen service number, that is specific for person 1
0x10325476

Later, I want to add columns to this database, for example what kind of value the house has this person lives in. I want to be able to 'couple' the databases by using the

anonymous citizen service number 1
0x10325476

Upvotes: 0

Views: 1647

Answers (1)

Alex KeySmith
Alex KeySmith

Reputation: 17091

It sounds like you'd be interested in Azure SQL Database dynamic data masking.

SQL Database dynamic data masking limits sensitive data exposure by masking it to non-privileged users.

Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.

For example, a service representative at a call center may identify callers by several digits of their credit card number, but those data items should not be fully exposed to the service representative. A masking rule can be defined that masks all but the last four digits of any credit card number in the result set of any query. As another example, an appropriate data mask can be defined to protect personally identifiable information (PII) data, so that a developer can query production environments for troubleshooting purposes without violating compliance regulations.

https://learn.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started

This won't anonymise data irreversibly, in terms of it can be re-personalised by those who have the permissions in SQL server.

It will however allow you to do joins inside of SQL server but not expose the personal data back out.

Upvotes: 0

Related Questions