Shubham Gupta
Shubham Gupta

Reputation: 660

get indices of keys in data.table

Is there a fast way to get indices of a value from data.table? I have set a column as key, however, I struggle to find efficient way to get its indices?

x <- sample(letters, 200, replace = TRUE)
y <- rnorm(200)
DT <- data.table(x, y, key = "x")
df <- data.frame(x, y)

Execution time:

system.time(for(i in 1:1000) DT[.("g"), which= TRUE]) # 0.3 sec
system.time(for(i in 1:1000) which(DT$x == "g")) # 0.004 sec
system.time(for(i in 1:1000) which(df$x == "g")) # 0.004 sec

I guess currently it is not able to use key for finding index in the last two execution. Is there any fast way?

Upvotes: 2

Views: 237

Answers (1)

Cole
Cole

Reputation: 11255

You seem to be 1) running into the time it takes to use [.data.table and 2) likely running into a lot of overhead to start the join operation only for only 200 rows. Going up to 2,000,000 rows results in the DT[.("g"), which = TRUE] to be very fast.

library(data.table)
x <- sample(letters, 200, replace = TRUE)
y <- rnorm(200)
DT <- data.table(x, y, key = "x")
bench::mark(which(DT$x == "g"),
            DT[.("g"), which = TRUE])

## # A tibble: 2 x 13
##   expression                   min  median `itr/sec` mem_alloc
##   <bch:expr>               <bch:t> <bch:t>     <dbl> <bch:byt>
## 1 which(DT$x == "g")         7.9us  11.2us    88385.    1.66KB
## 2 DT[.("g"), which = TRUE] 735.8us 905.8us     1010.   64.73KB

## 20,000 rows:

## # A tibble: 2 x 13
##   expression                 min median `itr/sec` mem_alloc
##   <bch:expr>               <bch> <bch:>     <dbl> <bch:byt>
## 1 which(DT$x == "g")       251us  265us     3654.   159.5KB
## 2 DT[.("g"), which = TRUE] 744us  907us      879.    67.8KB

## 2,000,000 rows:

## # A tibble: 2 x 13
##   expression                   min median `itr/sec` mem_alloc
##   <bch:expr>               <bch:t> <bch:>     <dbl> <bch:byt>
## 1 which(DT$x == "g")       21900us 24.9ms      40.6    15.6MB
## 2 DT[.("g"), which = TRUE]   868us  1.1ms     724.    366.1KB

Upvotes: 4

Related Questions