NPK
NPK

Reputation: 85

Data anonymization in R

I was trying to run the below code to mask the data in 2 columns, but failing with below error:

setwd("/cloud/project/CX")

Credit_tbl <-read.csv(file = 'Sample_data.csv',sep = ",",stringsAsFactors = FALSE)

anonymize <- function(x, algo="crc32"){
  unq_hashes <- vapply(unique(x), function(object) digest(object, algo=algo), FUN.VALUE="", USE.NAMES=TRUE)
  unname(unq_hashes[x])
}

cols_to_mask <- c("Email","Phone")

Credit_tbl[,cols_to_mask := lapply(.SD, anonymize),.SDcols=cols_to_mask,with=FALSE]

Error:

Error in [.data.frame(Credit_tbl, , :=(cols_to_mask, lapply(.SD, anonymize)), : unused arguments (.SDcols = cols_to_mask, with = FALSE)

Upvotes: 2

Views: 2219

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

You have a dataframe and you are using data.table syntax.

Convert dataframe to data.table and apply the function.

library(data.table)
library(digest)

cols_to_mask <- c("Email","Phone")

anonymize <- function(x, algo="crc32") {
    sapply(x, function(y) if(y == "" | is.na(y)) "" else digest(y, algo = algo))
}

setDT(Credit_tbl)
Credit_tbl[, (cols_to_mask) := lapply(.SD, anonymize), .SDcols = cols_to_mask]

Without changing to data.table you can apply the function using lapply :

Credit_tbl[cols_to_mask] <- lapply(Credit_tbl[cols_to_mask], anonymize)

Upvotes: 1

Related Questions