AcidCatfish
AcidCatfish

Reputation: 324

How to easily subset for specific data in a dataframe?

I have the following dataframe:

Data <- structure(list(ID = c(101, 102, 103, 104, 105, 106
), V1 = c(1, 3, 3, 1, 1, 1), V2 = c(1, 1, 
1, 1, 1, 1), V3 = c(3, 1, 1, 1, 1, 1), V4 = c(1, 
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")

I want to subset for the entries that have values of 3 or higher for the variables V1, V2, V3, or V4. They can have a score of 3 or higher for one of the variables or multiple, but they need at least one.

The method I am currently working with looks like this:

set <- grep('V', names(Data))
Data <- Data[rowSums(Data[set] > 2) > 0, set]

I almost get what I need but I am missing the column ID.

I supposed I could create a value called keep to keep the ID's and add them into the dataframe later, so I tried it.

keep <- Data$ID

Doesn't work when using the c() function and naming a new column since the replacement rows don't match. So I tried this

keep <- as.data.frame(keep)
Data <- merge(Data, keep, by=c('ID')

Which of course gives me an error because I forgot that Data won't have an existing ID column to merge with.

So now I am looking for a way to keep the ID in one step or included in the steps to subset for scores of 3 or higher early on.

Upvotes: 4

Views: 61

Answers (2)

Yellow_truffle
Yellow_truffle

Reputation: 923

Does this work for you?

df_sub <- subset(Data, V1>=3 | V2>=3| V3>=3)

So the result would be

   ID V1 V2 V3 V4
1 101  1  1  3  1
2 102  3  1  1  1
3 103  3  1  1  1

Upvotes: 2

d.b
d.b

Reputation: 32538

library(dplyr)
Data %>% filter_at(vars(-ID), any_vars(. >= 3))
# OR
Data %>% filter_at(vars(starts_with("V")), any_vars(. >= 3))

Upvotes: 2

Related Questions