Reputation: 253
I have a numeric matrix of 30,000 rows and 3 columns. I would like to generate a simple PASS/FAIL vector (or factor) based on the 3 values in each row of the matrix. I would like to apply the following logic:
If all 3 values in row > 3, enter PASS, else FAIL.
I know how to do this with a for loop, but how could I do it faster? I have dozens of these matrices... Thank you!
as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))
desired output: PASS, PASS, FAIL, FAIL
Upvotes: 3
Views: 211
Reputation: 23758
Unlike other answers here, this uses rowSums
but that's not looping in R and can outrun multiple subsets and logicals. It's probably the fastest route.
mat <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))
vec <- ifelse(rowSums(mat > 3) == 3, TRUE, FALSE)
We could also bypass ifelse
and make it even faster.
vec <- rowSums(mat > 3) == 3
If you test these for time that will probably be the winner. On my system, using 30,000 row matrices, my first answer comes out about twice as fast as the gung answer and the second one comes out 10x as fast and can execute on 1000 30,000 row matrices in about 2 seconds. The Codoremifa answer is the fastest data.table
based answer here and it takes 20s (similar to the gung answer).
NOTE: I kind of ignored your request for a "PASS", "FAIL" vector since you seemed to indicate speed was of paramount importance and it's a trivial semantic distinction. Furthermore, the logical vector is already prepared to subset the matrices if necessary.
Upvotes: 4
Reputation: 11893
For problems like this, my first inclination is to combine ?all, ?apply, & ?ifelse, perhaps like the solution @Ananda provides. As he mentions, apply()
is using a loop. If you want a completely vectorized solution, you could try:
newVector <- ifelse((xMatrix[,1]>3 & xMatrix[,2]>3 & xMatrix[,3]>3),
"PASS", "FAIL")
Vectorization is a handy feature of R, and it is much faster than loops. You can read about vectorization here.
Upvotes: 1
Reputation: 193517
Use all
and apply
(though apply
is using it's own loops).
m <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))
apply(m, 1, function(x) all(x > 3))
# [1] TRUE TRUE FALSE FALSE
If you really want "PASS" and "FAIL" instead, you can factor
the result of the apply
step.
factor(apply(m, 1, function(x) all(x > 3)),
levels = c(FALSE, TRUE),
labels = c("FAIL", "PASS"))
# [1] PASS PASS FAIL FAIL
# Levels: FAIL PASS
Extending Codoremifa's answer a little, a similar approach works with data.table
, especially since you specify that you want a vector or factor as the output.
library(data.table)
DT <- data.table(m)
DT[, all(.SD > 3), by = 1:nrow(DT)][, factor(V1, labels = c("FAIL", "PASS"))]
# [1] PASS PASS FAIL FAIL
# Levels: FAIL PASS
Upvotes: 5
Reputation: 13122
Also, mapply
:
mat <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))
fun <- function(x, y, z) { ifelse(x > 3 & y > 3 & z > 3, "PASS", "FAIL") }
mapply(fun, mat[,1], mat[,2], mat[,3])
#[1] "PASS" "PASS" "FAIL" "FAIL"
Upvotes: 2
Reputation: 12875
library(data.table)
dt <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))
dt <- data.table(dt)
dt[, Indicator :="FAIL"]
dt[V1 > 3 & V2 >3 & V3 >3, Indicator :="PASS" ]
Upvotes: 2