user2894356
user2894356

Reputation: 139

R: lapply with data.table slow, is there a faster alternative?

I got a table in which I would like to replace values of a single column using a lookup-table.

Using lapply this works fine. However, with a huge dataset it runs hours.

Is there a faster alternative? Thanks!

MWE:

require(data.table)

# Dummy data. 
dt_lookup <- data.table(cls_old=c(1:5), cls_new=c(5:1)) # Lookup-Table, the real data has different and non-continous entries.
dt <- data.table(cls=c(5:1), data=c(1,2,3,4,5)) # Table in which data shall be replaced depending on the lookup-table.

# Function to get the new column entry for every row based on the lookup-table.
get_new_label <- function(cls) {
  return(dt_lookup[cls_old==cls]$cls_new)
}

# Actual replacement of values.
dt <- dt[,cls:=lapply(cls, get_new_label)]

Upvotes: 1

Views: 386

Answers (2)

Roland
Roland

Reputation: 132706

If I'm not misunderstanding something, you can do a simple join:

dt[dt_lookup, cls := i.cls_new, on = .(cls = cls_old)]
dt
#   cls data
#1:   1    1
#2:   2    2
#3:   3    3
#4:   4    4
#5:   5    5

You should really spend some time and study the data.table vignettes and documentation.

Upvotes: 3

GKi
GKi

Reputation: 39657

You can use match to realize a lookup-table.

i <- match(dt$cls, dt_lookup$cls_old)
j <- !is.na(i)
dt$cls[j] <- dt_lookup$cls_new[i[j]]
dt
#   cls data
#1:   1    1
#2:   2    2
#3:   3    3
#4:   4    4
#5:   5    5

Also have a look at fast R lookup table.

Upvotes: 1

Related Questions