Reputation: 7654
The goal is to create and use anonymous names for firms. Doing so makes it possible to distribute samples of plots without disclosing proprietary information about specific firms.
The toy data frame shows that there can be multiple instances of firms and that names of different firms vary in unpredictable ways. The code works work but seems to be laborious and subject to mistakes.
Is there a more efficient way to rename each firm in a new variable that has an anonymous replacement name?
df <- data.frame(firm = c(rep("Alpha LLC",3), "Baker & Charlie", rep("Delta and Associates", 2), "Epsilon", "The Gamma Firm"), fees = rep(100, 500, 8))
# create a translation table (named vector) where each firm has a unique "name" of the form "Firm LETTER number"
uniq <- as.character(unique(df$firm))
uniq.df <- data.frame(firmname = uniq, anonfirm = paste0("Firm ", LETTERS[seq(1:length(uniq))], seq(1:length(uniq))))
# create a "named vector" with firm on top (as names) and anonymous name on bottom
translation.vec <- uniq.df[ , 2] # the anonymous name firm name
names(translation.vec) <- uniq.df[ , 1] # original name as column name for anonymous firm name
df$anon <- translation.vec[df$firm] # finds index of firm; replaces w/anonymous
> df
firm fees anon
1 Alpha LLC 100 Firm A1
2 Alpha LLC 100 Firm A1
3 Alpha LLC 100 Firm A1
4 Baker & Charlie 100 Firm B2
5 Delta and Associates 100 Firm C3
6 Delta and Associates 100 Firm C3
7 Epsilon 100 Firm D4
8 The Gamma Firm 100 Firm E5
Upvotes: 1
Views: 878
Reputation: 1981
Expanding on @LaurenGoodwin's very smart comment -
You can change to a factor, then to numeric, which will make each company a different number
companies <- LETTERS
anon <- as.numeric(as.factor(companies))
If you wanted it as more than a number, just change to a character and use paste.
anon <- paste('Firm', as.character(anon))
[1] "Firm 1" "Firm 2" "Firm 3" "Firm 4" "Firm 5" "Firm 6" "Firm 7"
Upvotes: 1
Reputation: 206242
When you store your firm names in the data.frame they become a factor. It's pretty easy just to swap the names of the levels of your factor. For example
set.seed(15) # so sample() is reproducible
newnames <- paste0("Firm ", LETTERS[1:nlevels(df$firm)], 1:nlevels(df$firm))
df$anon <- factor(df$firm, labels=sample(newnames))
Here I just change the labels of the factor. I also throw in a sample()
other wise the firms will be named in alphabetical order. This produces
firm fees anon
1 Alpha LLC 100 Firm D4
2 Alpha LLC 100 Firm D4
3 Alpha LLC 100 Firm D4
4 Baker & Charlie 100 Firm A1
5 Delta and Associates 100 Firm C3
6 Delta and Associates 100 Firm C3
7 Epsilon 100 Firm B2
8 The Gamma Firm 100 Firm E5
The order of your new levels of your factor will still contain some information about the original order of the firms; you can eliminate that data by casting to character if you plan to share the R data set rather than save to a flat text file or just display the information.
df$anon <- as.character(factor(df$firm, labels=sample(newnames)))
Upvotes: 5