lawyeR
lawyeR

Reputation: 7654

Create anonymous names for each unique factor level (e.g., companies)

The goal is to create and use anonymous names for firms. Doing so makes it possible to distribute samples of plots without disclosing proprietary information about specific firms.

The toy data frame shows that there can be multiple instances of firms and that names of different firms vary in unpredictable ways. The code works work but seems to be laborious and subject to mistakes.

Is there a more efficient way to rename each firm in a new variable that has an anonymous replacement name?

df <- data.frame(firm = c(rep("Alpha LLC",3), "Baker & Charlie", rep("Delta and Associates", 2), "Epsilon", "The Gamma Firm"), fees = rep(100, 500, 8))

# create a translation table (named vector) where each firm has a unique "name" of the form "Firm LETTER number"

uniq <- as.character(unique(df$firm))
uniq.df <- data.frame(firmname = uniq, anonfirm = paste0("Firm ", LETTERS[seq(1:length(uniq))], seq(1:length(uniq))))

# create a "named vector" with firm on top (as names) and anonymous name on bottom

translation.vec <- uniq.df[ , 2]  # the anonymous name firm name
names(translation.vec) <- uniq.df[ , 1] # original name as column name for anonymous firm name

df$anon <- translation.vec[df$firm] # finds index of firm; replaces w/anonymous

> df
                  firm fees    anon
1            Alpha LLC  100 Firm A1
2            Alpha LLC  100 Firm A1
3            Alpha LLC  100 Firm A1
4      Baker & Charlie  100 Firm B2
5 Delta and Associates  100 Firm C3
6 Delta and Associates  100 Firm C3
7              Epsilon  100 Firm D4
8       The Gamma Firm  100 Firm E5

Upvotes: 1

Views: 878

Answers (2)

Mhairi McNeill
Mhairi McNeill

Reputation: 1981

Expanding on @LaurenGoodwin's very smart comment -

You can change to a factor, then to numeric, which will make each company a different number

companies <- LETTERS

anon <- as.numeric(as.factor(companies))

If you wanted it as more than a number, just change to a character and use paste.

anon <- paste('Firm', as.character(anon))

[1] "Firm 1"  "Firm 2"  "Firm 3"  "Firm 4"  "Firm 5"  "Firm 6"  "Firm 7" 

Upvotes: 1

MrFlick
MrFlick

Reputation: 206242

When you store your firm names in the data.frame they become a factor. It's pretty easy just to swap the names of the levels of your factor. For example

set.seed(15) # so sample() is reproducible
newnames <- paste0("Firm ", LETTERS[1:nlevels(df$firm)], 1:nlevels(df$firm))
df$anon <- factor(df$firm, labels=sample(newnames))

Here I just change the labels of the factor. I also throw in a sample() other wise the firms will be named in alphabetical order. This produces

             firm fees    anon
1            Alpha LLC  100 Firm D4
2            Alpha LLC  100 Firm D4
3            Alpha LLC  100 Firm D4
4      Baker & Charlie  100 Firm A1
5 Delta and Associates  100 Firm C3
6 Delta and Associates  100 Firm C3
7              Epsilon  100 Firm B2
8       The Gamma Firm  100 Firm E5

The order of your new levels of your factor will still contain some information about the original order of the firms; you can eliminate that data by casting to character if you plan to share the R data set rather than save to a flat text file or just display the information.

df$anon <- as.character(factor(df$firm, labels=sample(newnames)))

Upvotes: 5

Related Questions