quant
quant

Reputation: 23082

Why is the R match function so slow?

The following should find the location of the first instance of the integer 1:

array <- rep(1,10000000)
system.time(match(1,array))

This returns

   user  system elapsed
  0.720   1.243   1.964

If I run the same task using an array of size 100 I get this:

   user  system elapsed
      0       0       0

Since all it should be doing is looking at the first value in the array and returning a match, the time taken should be that of a lookup and a comparison, regardless of the size of the array. If I wrote this in lower-level language it would cost in the order of a handful of clock cycles (a microsecond or less?) regardless of the array size. Why does it take a second in R? It seems to be iterating through he whole array...

Is there a way for it to abort once it has found its match, rather than continuing to iterate unnecessarily?

Upvotes: 6

Views: 1364

Answers (1)

Gabor Csardi
Gabor Csardi

Reputation: 10825

The reason is that R is not actually doing linear search, but it sets up a hash table. This is effective if you are searching for several items, but not very effective if you are searching for one number only. Here is the profiling of the function:

enter image description here

A "better" implementation could use a linear search, if you are searching for a single integer in an array. I guess that would be faster.

Upvotes: 9

Related Questions