Reputation: 1664
Here's the code for the issue:
y <- 1e7
renamer <- function(text){
text[grep("ac", text)] <- "aaa"
text[grep("gf", text)] <- "bbb"
text[grep("er", text)] <- "ccc"
text[grep("hy", text)] <- "ddd"
text[grep("nh", text)] <- "eee"
text[grep("oi", text)] <- "fff"
text[grep("nu", text)] <- "ggg"
text[grep("vf", text)] <- "hhh"
text[grep("cd", text)] <- "iii"
text[grep("po", text)] <- "jjj"
smp <- NULL
for(i in 1:100){
smp <- c(smp, paste0(sample(letters, 15, T), collapse= ""))
df <- data.table(a = sample(smp, y, T))
# > system.time(renamer(text = df$a))
# user system elapsed
# 15.54 0.08 15.70
Problem: there's a large data set that requires most of their values replaced in a time efficient manner. My code does the trick.. however, I really could use a faster solution.
Note that there are reoccurring values. And... (as it sometimes happens) while I was writing this question, I probably came up with solution which includes converting column to factor and replacing level values. But I decided to leave this question anyways, as someone else might need a help on this problem or there is some clever alternative solution.
Here's a factor solution for benchmark:
# > system.time({
# + df$a <- factor(df$a)
# + levels(df$a) <- renamer(levels(df$a))
# + df$a <- as.character(df$a)
# + })
# user system elapsed
# 1.25 0.14 1.42
Upvotes: 3
Views: 183
Reputation: 92282
I would suggest creating a simple lookup table and use the excellent stringi::stri_detect_fixed
function (gives me ~X100 speedup)
Lookup <- c("ac", "gf", "er", "hy", "nh", "oi", "nu", "vf", "cd", "po")
Rename <- substring(paste(rep(letters[1:10], each = 3), collapse = ""),
seq(1, 30 ,3), seq(3, 30, 3))
system.time(setDT(df)[, Result := Rename[stri_detect_fixed(a, Lookup)], by = a])
# user system elapsed
# 0.10 0.05 0.14
Upvotes: 2