Nick
Nick

Reputation: 22175

How can you efficiently check values of large vectors in R?

One thing I want to do all the time in my R code is to test whether certain conditions hold for a vector, such as whether it contains any or all values equal to some specified value. The Rish way to do this is to create a boolean vector and use any or all, for example:

any(is.na(my_big_vector))
all(my_big_vector == my_big_vector[[1]])
...

It seems really inefficient to me to allocate a big vector and fill it with values, just to throw it away (especially if any() or all() call can be short-circuited after testing only a couple of the values. Is there a better way to do this, or should I just hand in my desire to write code that is both efficient and succinct when working in R?

Upvotes: 3

Views: 792

Answers (3)

mbq
mbq

Reputation: 18628

I think it is not a good idea -- R is a very high-level language, so what you should do is to follow standards. This way R developers know what to optimize. You should also remember that while R is functional and lazy language, it is even possible that statement like

any(is.na(a))

can be recognized and executed as something like

.Internal(is_any_na,a)

Upvotes: 0

Dirk is no longer here
Dirk is no longer here

Reputation: 368181

"Cheap, fast, reliable: pick any two" is a dry way of saying that you sometimes need to order your priorities when building or designing systems.

It is rather similar here: the cost of the concise expression is the fact that memory gets allocated behind the scenes. If that really is a problem, then you can always write a (compiled ?) routines to runs (quickly) along the vectors and uses only pair of values at a time.

You can trade off memory usage versus performance versus expressiveness, but is difficult to hit all three at the same time.

Upvotes: 3

nico
nico

Reputation: 51640

which(is.na(my_big_vector))
which(my_big_vector == 5)
which(my_big_vector < 3)

And if you want to count them...

length(which(is.na(my_big_vector)))

Upvotes: 0

Related Questions