Reputation: 47
Lets say I have this dataset:
data1 = sample(1:250, 250)
data2 = sample(1:250, 250)
data <- data.frame(data1,data2)
If I want to subset 'data' by 30 values in both 'data1' and 'data2' what would be the best way to do that? For example, from 'data' I want to select all rows where data1= 4 or 12 or 13 or 24 and data2= 4 or 12 or 13 or 24 and data2= 4 or 12 or 13 or 24. I want rows where both conditions are true.
I wrote this out like:
subdata <- subset(data, data1 == 4 |data1 == 12 |data1 == 13 |data1 == 24 & data2 == 4 |data2 == 12 |data2 == 13 |data2 == 24)
But this doesn't seem meet both conditions, rather it's one or the other.
Upvotes: 3
Views: 18825
Reputation: 18661
Note that in your original subset
, you didn't wrap your |
tests for data1
and data2
in brackets. This produces the wrong subset of "data1= 4 or 12 or 13 or 24 OR data2= 4 or 12 or 13 or 24". You actually want:
subdata <- subset(data, (data1 == 4 |data1 == 12 |data1 == 13 |data1 == 24) & (data2 == 4 |data2 == 12 |data2 == 13 |data2 == 24))
Here is how you would modify your subset
function with %in%
:
subdata <- subset(data, (data1 %in% c(4, 12, 13, 24)) & (data2 %in% c(4, 12, 13, 24)))
Below I provide an elegant dplyr
approach with filter_all
:
library(dplyr)
data %>%
filter_all(all_vars(. %in% c(4, 12, 13, 24)))
Note:
Your sample
functions do not easily produce sample data where the tests are actually true. As a result the above solution would likely return zero rows. I've therefore modified your sample dataset to produce rows that actually have matches that you can subset.
Data:
set.seed(1)
data1 = sample(c(4, 12, 13, 24, 100, 123), 500, replace = TRUE)
data2 = sample(c(4, 12, 13, 24, 100, 123), 500, replace = TRUE)
data <- data.frame(data1,data2)
Upvotes: 2