Reputation: 324
I have the following dataframe:
Data <- structure(list(ID = c(101, 102, 103, 104, 105, 106
), V1 = c(1, 3, 3, 1, 1, 1), V2 = c(1, 1,
1, 1, 1, 1), V3 = c(3, 1, 1, 1, 1, 1), V4 = c(1,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
I want to subset for the entries that have values of 3 or higher for the variables V1, V2, V3, or V4. They can have a score of 3 or higher for one of the variables or multiple, but they need at least one.
The method I am currently working with looks like this:
set <- grep('V', names(Data))
Data <- Data[rowSums(Data[set] > 2) > 0, set]
I almost get what I need but I am missing the column ID.
I supposed I could create a value called keep to keep the ID's and add them into the dataframe later, so I tried it.
keep <- Data$ID
Doesn't work when using the c()
function and naming a new column since the replacement rows don't match. So I tried this
keep <- as.data.frame(keep)
Data <- merge(Data, keep, by=c('ID')
Which of course gives me an error because I forgot that Data won't have an existing ID column to merge with.
So now I am looking for a way to keep the ID in one step or included in the steps to subset for scores of 3 or higher early on.
Upvotes: 4
Views: 61
Reputation: 923
Does this work for you?
df_sub <- subset(Data, V1>=3 | V2>=3| V3>=3)
So the result would be
ID V1 V2 V3 V4
1 101 1 1 3 1
2 102 3 1 1 1
3 103 3 1 1 1
Upvotes: 2
Reputation: 32538
library(dplyr)
Data %>% filter_at(vars(-ID), any_vars(. >= 3))
# OR
Data %>% filter_at(vars(starts_with("V")), any_vars(. >= 3))
Upvotes: 2