K D
K D

Reputation: 103

GCP Data Loss Prevention API How to deindentify free text column in table

I have a scenario where, a table contains comments column which is free text. The comments column store user feedback and comments. I want to mask/deidentify it using Google Cloud Data Loss prevention API. While trying to deIdentify, I am observing that DLP API is deindetifying whole content of comment column and not sensitive content only. Example - if column contains 'My eamil id is [email protected]' then I am get out as '** **** ** **...'

Here while sampling DLP api identifies email address as sensitive data in comments column.

I went through following example - Free text - https://cloud.google.com/dlp/docs/deidentify-sensitive-data and Table - https://cloud.google.com/dlp/docs/examples-deid-tables However I am looking for example where free text is part of column of specific table and while submitting DLP request I want to submit as whole table only and not separate free text only. Is there any kind of special handling required to achieve this ?

Upvotes: 0

Views: 624

Answers (1)

Gellaboina Ashish
Gellaboina Ashish

Reputation: 573

  • You can do this operation using the method: projects.content.deidentify.

  • For example: The following is the sample table that contains 2 columns and one among them is a free text column.

S_No Free_text_column
1 My email is [email protected]
2 No Feedback
3 [email protected]

To de- identify email ID information from the Free_text_column, follow the below steps.

Here is the sample REST API call with all the below resources configured. Replace the “project-ID” value in the “parent” field with your project ID and execute the call. In the output we can see the email IDs in the Free_text_column are replaced with string "#####".

  1. Start with creating the “DeidentifyConfig” resource as follows,
  • Configure “recordTransformations.fieldTransformations” field, this field takes the column name as input and lets us apply transformations to that column within a table.
  • Next, configure the “infoTypeTransformations.transformations.primitiveTransformation” field, this field takes a rule for transforming a value. We want to replace email ID with string “#####”, so we make use of “replaceConfig” field and assign it a value that replaces the email ID in the column data. Refer DeidentifyConfig for more information.
  1. Next, create the “item” resource. Since our input is a table, configure the “item.table” field with headers(column names) and rows(values corresponding to each column). Refer ContentItem for more information.

  2. Lastly, create the “inspectConfig” resource that has a configuration description of the scanning process. The “inspectConfig.infoTypes” field’s input is the infotype that we want to de-identify, in our case it is “EMAIL_ADDRESS”. Refer InspectConfig for more information.

Upvotes: 0

Related Questions