Reputation: 139
I got a table in which I would like to replace values of a single column using a lookup-table.
Using lapply this works fine. However, with a huge dataset it runs hours.
Is there a faster alternative? Thanks!
MWE:
require(data.table)
# Dummy data.
dt_lookup <- data.table(cls_old=c(1:5), cls_new=c(5:1)) # Lookup-Table, the real data has different and non-continous entries.
dt <- data.table(cls=c(5:1), data=c(1,2,3,4,5)) # Table in which data shall be replaced depending on the lookup-table.
# Function to get the new column entry for every row based on the lookup-table.
get_new_label <- function(cls) {
return(dt_lookup[cls_old==cls]$cls_new)
}
# Actual replacement of values.
dt <- dt[,cls:=lapply(cls, get_new_label)]
Upvotes: 1
Views: 386
Reputation: 132706
If I'm not misunderstanding something, you can do a simple join:
dt[dt_lookup, cls := i.cls_new, on = .(cls = cls_old)]
dt
# cls data
#1: 1 1
#2: 2 2
#3: 3 3
#4: 4 4
#5: 5 5
You should really spend some time and study the data.table vignettes and documentation.
Upvotes: 3
Reputation: 39657
You can use match
to realize a lookup-table.
i <- match(dt$cls, dt_lookup$cls_old)
j <- !is.na(i)
dt$cls[j] <- dt_lookup$cls_new[i[j]]
dt
# cls data
#1: 1 1
#2: 2 2
#3: 3 3
#4: 4 4
#5: 5 5
Also have a look at fast R lookup table.
Upvotes: 1