Reputation: 531
I would like to use the ifelse statement to create a new variable, say, z. However, one of the return values depends on the i-th column of a matrix. Here is a simple example
set.seed(1)
data <- data.frame(x = rnorm(10), y = rnorm(10), ind = rep(c(0, 1), 5))
m <- data.frame(matrix(rnorm(100), 10, 10))
z <- ifelse(data$ind == 1, data$x, sum(m[, i]))
I know the line with z won't run, but it illustrates what I would like to do. If a subject has the ind variable equal to 0, then I assign to z the sum of the 10 entries in m corresponding to subject i's column.
Could I do this with ifelse, or would I need a for loop? I'm trying to stay away from for loops, which is why I am trying ifelse in the first place.
Here is what z should look like:
z
[1] -1.3367324 0.1836433 1.3413668 1.5952808 4.5120996 -0.8204684 1.2736029
[8] 0.7383247 3.4748021 -0.3053884
Thanks!
Upvotes: 2
Views: 125
Reputation: 886968
Or we can use arithmetic
colSums(m)*(data$ind==0) + (data$ind==1)*data$x
# X1 X2 X3 X4 X5 X6 X7
#-1.3367324 0.1836433 1.3413668 1.5952808 4.5120996 -0.8204684 1.2736029
# X8 X9 X10
# 0.7383247 3.4748021 -0.3053884
Upvotes: 3
Reputation: 6267
Yes you can do it with ifelse
and a one-liner, very close to what you wrote:
z <- ifelse(data$ind == 0, colSums(m), data$x)
Here is what R does when it executes this statement:
data$ind == 0
, and stores into memory the two numeric vectors colSums(m)
and data$x
(data$ind == 0)
is True
, it outputs colSums(m)
; where (data$ind == 0)
is False
, it outputs data$x
Upvotes: 4
Reputation: 24945
You can do it in a two-liner instead:
z <- data$x
z[data$ind == 0] <- colSums(m[,data$ind == 0])
[1] -1.3367324 0.1836433 1.3413668 1.5952808 4.5120996 -0.8204684 1.2736029 0.7383247 3.4748021
[10] -0.3053884
more generally, you could use an apply
function. This will in general be slower than a straight vectorised solution, like the above. Here's sapply:
sapply(1:nrow(data), function(x){ifelse(data$ind[x] == 1, data$x[x], sum(m[, x]))})
[1] -1.3367324 0.1836433 1.3413668 1.5952808 4.5120996 -0.8204684 1.2736029 0.7383247 3.4748021
[10] -0.3053884
A benchmark:
microbenchmark::microbenchmark(
sapply = sapply(1:nrow(data), function(x){ifelse(data$ind[x] == 1, data$x[x], sum(m[, x]))}),
vectorised = {z <- data$x;
z[data$ind == 0] <- colSums(m[,data$ind == 0])})
Unit: microseconds
expr min lq mean median uq max neval cld
sapply 391.297 408.193 423.6525 412.4170 423.7450 853.249 100 b
vectorised 197.377 199.873 208.7701 202.5605 214.4645 284.545 100 a
Upvotes: 3