Reputation: 4104
> class(v)
"numeric"
> length(v)
80373285 # 80 million
The entries of v
are integers uniformly distributed between 0 and 100.
> ptm <- proc.time()
> tv <- table(v)
> show(proc.time() - ptm)
user system elapsed
96.902 0.807 97.761
Why is the table
function so slow on this vector?
Is there a faster function for this simple operation?
By comparison, the bigtable
function from bigtabulate
is fast:
> library(bigtabulate)
> ptm <- proc.time() ; bt <- bigtable(x = matrix(v,ncol=1), ccols=1) ; show(proc.time() - ptm)
user system elapsed
4.163 0.120 4.286
While bigtabulate
is a good solution, it seems unwieldy to resort to a special package just for this simple function. Technically, there is overhead because I am contorting a vector into a matrix to make it work with bigtable
. Shouldn't there be simpler, faster solution in base R
?
For whatever its worth, the base R
function cumsum
is extremely fast even for this long vector:
> ptm <- proc.time() ; cs <- cumsum(v) ; show(proc.time() - ptm)
user system elapsed
0.097 0.117 0.214
Upvotes: 8
Views: 1701
Reputation: 73315
Because it calls factor
first. Try tabulate
if all your entries are integers. But you need to plus 1, so that the vector values start from 1 not 0.
Upvotes: 12