Reputation: 8202
I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.
my.data.frame <- data[(data$V1 > 2) & (data$V2 < 4), ]
But I don't know how to use an 'OR' in the above.
Upvotes: 207
Views: 729659
Reputation: 6954
In case anyone is looking for a very scalable solution that is applicable if you want to test multiple columns for the same condition, you can use Reduce
or rowSums
.
df <- base::expand.grid(x = c(0, 1),
y = c(0, 1),
z = c(0, 1))
df
#> x y z
#> 1 0 0 0
#> 2 1 0 0
#> 3 0 1 0
#> 4 1 1 0
#> 5 0 0 1
#> 6 1 0 1
#> 7 0 1 1
#> 8 1 1 1
Does it contain any 0? Keeps every row except for row 8 that is filled with 1 only.
The function + in Reduce()
basically works as an OR operator since its result is >0 if it contains any TRUE value.
## Reduce ---------------------------------------------------
df[Reduce(f = `+`, x = lapply(df, `==`, 0)) > 0, ]
#> x y z
#> 1 0 0 0
#> 2 1 0 0
#> 3 0 1 0
#> 4 1 1 0
#> 5 0 0 1
#> 6 1 0 1
#> 7 0 1 1
## rowSums --------------------------------------------------
df[rowSums(df == 0) > 0, ]
#> x y z
#> 1 0 0 0
#> 2 1 0 0
#> 3 0 1 0
#> 4 1 1 0
#> 5 0 0 1
#> 6 1 0 1
#> 7 0 1 1
Note that you can use Reduce
also easily to apply multiple AND
conditions by using *
instead of +
. Multiplying all logicals only returns a value >0
if all cases are TRUE
.
df[Reduce(`*`, lapply(df, `==`, 0)) > 0, ]
#> x y z
#> 1 0 0 0
Upvotes: 1
Reputation: 41553
A data.table
option for completeness:
library(data.table)
dt <- data.table(V1 = runif(10, 0, 1),
V2 = letters[1:10])
dt[V1 > 0.5 | V2 == "b",]
#> V1 V2
#> 1: 0.7294220 a
#> 2: 0.9717687 b
#> 3: 0.7177076 c
#> 4: 0.5963838 e
#> 5: 0.5456320 i
Created on 2022-07-10 by the reprex package (v2.0.1)
For more info about this useful package check this link.
Upvotes: 0
Reputation: 263481
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
Order of arguments may matter when using '&".
Upvotes: 289
Reputation: 13580
Just for the sake of completeness, we can use the operators [
and [[
:
set.seed(1)
df <- data.frame(v1 = runif(10), v2 = letters[1:10])
Several options
df[df[1] < 0.5 | df[2] == "g", ]
df[df[[1]] < 0.5 | df[[2]] == "g", ]
df[df["v1"] < 0.5 | df["v2"] == "g", ]
df$name is equivalent to df[["name", exact = FALSE]]
Using dplyr
:
library(dplyr)
filter(df, v1 < 0.5 | v2 == "g")
Using sqldf
:
library(sqldf)
sqldf('SELECT *
FROM df
WHERE v1 < 0.5 OR v2 = "g"')
Output for the above options:
v1 v2
1 0.26550866 a
2 0.37212390 b
3 0.20168193 e
4 0.94467527 g
5 0.06178627 j
Upvotes: 19
Reputation: 1070
You are looking for "|." See http://cran.r-project.org/doc/manuals/R-intro.html#Logical-vectors
my.data.frame <- data[(data$V1 > 2) | (data$V2 < 4), ]
Upvotes: 35