Reputation: 43
Consider simple data:
> cbind(x,y)
x y
[1,] -1 99
[2,] 5 4
[3,] 10 -2
[4,] 600 0
[5,] -16 1
[6,] 0 55
Now consider this simple nested ifelse statement:
ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x)
Which gives me a result of:
[1] 99 4 10 0 1 0
It should be easy to see what the code does: it replaces values in x with either:
1) smaller value in y if both x,y are non-negative
2) any non-negative value of y if x is negative
or leaves x alone.
My question is: this code is not very computationally efficient, can you think of any way to code this efficiently? Thanks!
Upvotes: 3
Views: 168
Reputation: 4907
This is more of a summary of the above answers than a unique answer; but I do provide time comparisons.
b
is a small speed up by combining operations. c-e
are all previously provided answers. @mra68's answer appears the fastest
library(microbenchmark)
microbenchmark(a= ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x),
b= {ifelse(y>= 0, ifelse(x>y | x<0, y,x), x)},
c= {z <- y; z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0];z},
d= x * ((x < y & x >= 0) | y < 0) + y * ((x > y & y >= 0) | x < 0),
e= (x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2)
Unit: microseconds
expr min lq mean median uq max neval cld
a 16.346 18.6270 21.88066 19.387 20.528 77.548 100 c
b 10.644 11.4040 13.05781 11.785 12.545 39.154 100 b
c 3.801 4.1820 5.10146 4.562 4.942 18.247 100 a
d 3.041 3.4210 4.37168 3.801 3.802 33.452 100 a
e 2.281 2.8515 3.36810 3.041 3.421 18.246 100 a
Though, IMO, the lack of readability in the fastest solution isn't worth the speedup.
Depending on the actual use case, you could achieve a speedup by ordering your if-else
operations such that the minimum number of operations pass further down the call stack.
Upvotes: 1
Reputation: 13570
Another option without indexing:
x * ((x < y & x >= 0) | y < 0) + y * ((x > y & y >= 0) | x < 0)
Output:
[1] 99 4 10 0 1 0
Time comparison, it seems mra68 answer is the fastest:
library(microbenchmark)
microbenchmark(
TylerRinker = z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0],
mra68 =(x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2,
mpalanco = x *((x < y & x >= 0)| y < 0)+ y * ((x > y & y >= 0)| x < 0),
if_else = ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x)
)
Unit: microseconds
expr min lq mean median uq max neval cld
TylerRinker 8.800 9.7780 11.47480 10.267 10.268 75.778 100 a
mra68 5.867 6.3560 9.40188 6.845 7.334 214.623 100 a
mpalanco 7.334 7.8230 8.67836 8.311 8.800 30.312 100 a
if_else 44.489 45.9565 54.61929 53.289 53.290 245.911 100 b
Upvotes: 3
Reputation: 2960
You can use that x
is the sum and y
is the difference of (x+y)/2
and (x-y)/2
. Then calculate with logical expressions (TRUE
equals 1 and FALSE
equals 0):
(x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2
gives the same result as the nested ifelse
-expression.
Speed comparison, using vectors of length 500:
> set.seed(1)
> x <- sample(-100:100,500,replace=TRUE)
> y <- sample(-100:100,500,replace=TRUE)
> system.time(
+ for ( i in 1:100000 )
+ {
+ A <- (x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2
+ }
+ )
user system elapsed
8.46 0.00 8.51
> system.time(
+ for ( i in 1:100000 )
+ {
+ B <- ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x)
+ }
+ )
user system elapsed
74.58 0.03 75.05
> system.time(
+ for ( i in 1:100000 )
+ {
+ z <- y
+ z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0];z
+ }
+ )
user system elapsed
23.32 0.00 23.44
Check if the results are the same:
> all(A==B)
[1] TRUE
> all(A==z)
[1] TRUE
>
Upvotes: 3
Reputation: 109874
Maybe just indexing. I don't know if it's any more efficient:
dat <- read.table(text=" x y
[1,] -1 99
[2,] 5 4
[3,] 10 -2
[4,] 600 0
[5,] -16 1
[6,] 0 55", header=TRUE)
x <- dat[, 1]
y <- dat[, 2]
z <- y
z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0];z
## 99 4 10 0 1 0
Upvotes: 2