cmo
cmo

Reputation: 4104

R: table function suprisingly slow

> class(v)
"numeric"
> length(v)
80373285   # 80 million

The entries of v are integers uniformly distributed between 0 and 100.

> ptm  <-  proc.time()
> tv   <-  table(v)
> show(proc.time() - ptm)
   user  system elapsed 
 96.902   0.807  97.761 

Why is the table function so slow on this vector?

Is there a faster function for this simple operation?

By comparison, the bigtable function from bigtabulate is fast:

> library(bigtabulate)
> ptm  <-  proc.time() ;  bt <- bigtable(x = matrix(v,ncol=1), ccols=1) ; show(proc.time() - ptm)
   user  system elapsed 
  4.163   0.120   4.286 

While bigtabulate is a good solution, it seems unwieldy to resort to a special package just for this simple function. Technically, there is overhead because I am contorting a vector into a matrix to make it work with bigtable. Shouldn't there be simpler, faster solution in base R?

For whatever its worth, the base R function cumsum is extremely fast even for this long vector:

> ptm  <-  proc.time() ; cs   <-  cumsum(v) ; show(proc.time() - ptm)
   user  system elapsed 
  0.097   0.117   0.214 

Upvotes: 8

Views: 1701

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73315

Because it calls factor first. Try tabulate if all your entries are integers. But you need to plus 1, so that the vector values start from 1 not 0.

Upvotes: 12

Related Questions