Reputation: 23082
The following should find the location of the first instance of the integer 1
:
array <- rep(1,10000000)
system.time(match(1,array))
This returns
user system elapsed
0.720 1.243 1.964
If I run the same task using an array of size 100
I get this:
user system elapsed
0 0 0
Since all it should be doing is looking at the first value in the array and returning a match, the time taken should be that of a lookup and a comparison, regardless of the size of the array. If I wrote this in lower-level language it would cost in the order of a handful of clock cycles (a microsecond or less?) regardless of the array size. Why does it take a second in R? It seems to be iterating through he whole array...
Is there a way for it to abort once it has found its match, rather than continuing to iterate unnecessarily?
Upvotes: 6
Views: 1364
Reputation: 10825
The reason is that R is not actually doing linear search, but it sets up a hash table. This is effective if you are searching for several items, but not very effective if you are searching for one number only. Here is the profiling of the function:
A "better" implementation could use a linear search, if you are searching for a single integer in an array. I guess that would be faster.
Upvotes: 9