Reputation: 22175
One thing I want to do all the time in my R code is to test whether certain conditions hold for a vector, such as whether it contains any or all values equal to some specified value. The R
ish way to do this is to create a boolean vector and use any or all, for example:
any(is.na(my_big_vector))
all(my_big_vector == my_big_vector[[1]])
...
It seems really inefficient to me to allocate a big vector and fill it with values, just to throw it away (especially if any()
or all()
call can be short-circuited after testing only a couple of the values. Is there a better way to do this, or should I just hand in my desire to write code that is both efficient and succinct when working in R
?
Upvotes: 3
Views: 792
Reputation: 18628
I think it is not a good idea -- R is a very high-level language, so what you should do is to follow standards. This way R developers know what to optimize. You should also remember that while R is functional and lazy language, it is even possible that statement like
any(is.na(a))
can be recognized and executed as something like
.Internal(is_any_na,a)
Upvotes: 0
Reputation: 368181
"Cheap, fast, reliable: pick any two" is a dry way of saying that you sometimes need to order your priorities when building or designing systems.
It is rather similar here: the cost of the concise expression is the fact that memory gets allocated behind the scenes. If that really is a problem, then you can always write a (compiled ?) routines to runs (quickly) along the vectors and uses only pair of values at a time.
You can trade off memory usage versus performance versus expressiveness, but is difficult to hit all three at the same time.
Upvotes: 3
Reputation: 51640
which(is.na(my_big_vector))
which(my_big_vector == 5)
which(my_big_vector < 3)
And if you want to count them...
length(which(is.na(my_big_vector)))
Upvotes: 0