Reputation: 21047
I have a data frame which looks as follows:
muestra[1:10,2:5]
## X0 X1 X2 X3
## 21129 0 0 0 0
## 34632 0 0 0 0
## 30612 0 0 0 0
## 10687 0 0 1 2
## 44815 0 0 0 1
## 40552 0 0 0 1
## 15311 0 0 0 0
## 33960 0 0 0 0
## 24073 0 0 0 0
## 13077 0 0 0 0
I'm comparing the rows for a particular vector of values:
muestra[1:10,2:5] == c(0,0,0,0)
## X0 X1 X2 X3
## 21129 TRUE TRUE TRUE TRUE
## 34632 TRUE TRUE TRUE TRUE
## 30612 TRUE TRUE TRUE TRUE
## 10687 TRUE TRUE FALSE FALSE
## 44815 TRUE TRUE TRUE FALSE
## 40552 TRUE TRUE TRUE FALSE
## 15311 TRUE TRUE TRUE TRUE
## 33960 TRUE TRUE TRUE TRUE
## 24073 TRUE TRUE TRUE TRUE
## 13077 TRUE TRUE TRUE TRUE
The value of the comparisson vector might change; i.e. it can be c(0,0,1,0)
, c(1,2,1,2)
, etcetera.
I'd like to check if the full row meets the condition; Is there a function that returns something like this:
some_function(muestra[1:10,2:5], c(0,0,0,0))
## some_function(muestra[1:10,2:5], c(0,0,0,0))
## 21129 TRUE
## 34632 TRUE
## 30612 TRUE
## 10687 FALSE
## 44815 FALSE
## 40552 FALSE
## 15311 TRUE
## 33960 TRUE
## 24073 TRUE
## 13077 TRUE
Upvotes: 1
Views: 5239
Reputation: 73295
You are looking for all()
. Apply all()
to each row.
Let's consider a more general target vector, say y <- c(0,0,1,0)
, then we could do:
x <- muestra[1:10,2:5]
apply(x == rep(y, each = nrow(x)), 1, all)
apply
is inefficient as it is not vectorized. If I am to do this job I would choose rowSums()
. I would use:
rowSums(x == rep(y, each = nrow(x))) == ncol(x)
I am happy to make a benchmark, too. I know for the first time that there is a function col
. But it seems that using rep
is slightly more efficient:
set.seed(123)
x <- matrix(sample(1e7), ncol = 10)
y <- sample(10)
library(microbenchmark)
microbenchmark(" ZL_apply:" = apply(x == rep(y, each = nrow(x)), 1, all),
"ZL_rowSums:" = rowSums(x == rep(y, each = nrow(x))) == ncol(x),
" DA:" = rowSums(x == y[col(x)]) == ncol(x))
Unit: milliseconds
expr min lq mean median uq max neval
ZL_apply: 3278.6738 3312.5376 3349.2760 3347.4750 3378.5720 3506.4211 100
ZL_rowSums: 314.2683 318.1528 331.2623 324.5413 336.5447 427.5261 100
DA: 422.7039 432.3683 461.4871 461.8067 476.1305 624.4142 100
Upvotes: 5
Reputation: 92292
Pardon me for not liking by-row operations. I would combine col
with rowSums
instead
rowSums(df == c(0,0,0,0)[col(df)]) == ncol(df)
# 21129 34632 30612 10687 44815 40552 15311 33960 24073 13077
# TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
Some benchmark
set.seed(123)
df <- as.data.frame(matrix(sample(1e7), ncol = 10))
vec <- sample(10)
library(microbenchmark)
microbenchmark("ZL: " = apply(df== vec, 1, all),
"DA: " = rowSums(df == vec[col(df)]) == ncol(df))
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# ZL: 2262.580 2386.5286 2421.7244 2420.6767 2454.1483 2592.888 100 b
# DA: 786.121 807.1531 836.7408 827.1577 849.9955 1038.139 100 a
Upvotes: 3