Reputation: 103
I have a scenario where, a table contains comments column which is free text. The comments column store user feedback and comments. I want to mask/deidentify it using Google Cloud Data Loss prevention API. While trying to deIdentify, I am observing that DLP API is deindetifying whole content of comment column and not sensitive content only. Example - if column contains 'My eamil id is [email protected]' then I am get out as '** **** ** **...'
Here while sampling DLP api identifies email address as sensitive data in comments column.
I went through following example - Free text - https://cloud.google.com/dlp/docs/deidentify-sensitive-data and Table - https://cloud.google.com/dlp/docs/examples-deid-tables However I am looking for example where free text is part of column of specific table and while submitting DLP request I want to submit as whole table only and not separate free text only. Is there any kind of special handling required to achieve this ?
Upvotes: 0
Views: 624
Reputation: 573
You can do this operation using the method: projects.content.deidentify.
For example: The following is the sample table that contains 2 columns and one among them is a free text column.
S_No | Free_text_column |
---|---|
1 | My email is [email protected] |
2 | No Feedback |
3 | [email protected] |
To de- identify email ID information from the Free_text_column, follow the below steps.
Here is the sample REST API call with all the below resources configured. Replace the “project-ID” value in the “parent” field with your project ID and execute the call. In the output we can see the email IDs in the Free_text_column are replaced with string "#####".
Next, create the “item” resource. Since our input is a table, configure the “item.table” field with headers(column names) and rows(values corresponding to each column). Refer ContentItem for more information.
Lastly, create the “inspectConfig” resource that has a configuration description of the scanning process. The “inspectConfig.infoTypes” field’s input is the infotype that we want to de-identify, in our case it is “EMAIL_ADDRESS”. Refer InspectConfig for more information.
Upvotes: 0