Reputation: 85
I would like to de-identified my PII data that already in BiqQuery with Google DLP, and store the result in another table in BigQuery. Is that possible ? and how to do that ?
Upvotes: 3
Views: 1511
Reputation: 2338
This feature is currently in preview (October 2022). Talk to your Google Cloud sales rep to see if it can be enabled for your project.
Upvotes: 0
Reputation: 2099
The different methods for De-Indentifying sensitive data in DLP are available through API, for example, we can use replaceConfig
to replace from:
My email address is [email protected].
to
My email address is [email-address].
by using an API request like this:
"deidentifyConfig":{
"infoTypeTransformations":{
"transformations":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"primitiveTransformation":{
"replaceConfig":{
"newValue":{
"stringValue":"[email-address]"
}
}
}
}
]
}
}
So, for your use case you would need to integrate the De-identifying API into a flow that reads from BigQuery, perform the De-identifying transformations and writes back to BigQuery.
Cloud DLP in action is a Google post that talks about this. It points out to Dataflow to achieve this use case. Please refer to this Reference Architecture to have an idea of how this can work, in there you will find some Java classes examples. You can modify it if needed so that you can ingest it to BigQuery.
Upvotes: 1
Reputation: 995
Currently the main recommendation is to use dataflow.
https://github.com/GoogleCloudPlatform/dlp-dataflow-deidentification
Upvotes: 1
Reputation: 3616
As a quick workaround, I would consider moving the tables with PII into a dataset with restricted access. Then, in a new dataset, create a view that does not include the sensitive columns. Give users query access to only the dataset with the view, and not the private dataset.
https://cloud.google.com/bigquery/docs/share-access-views
Upvotes: 0