Subsetting whole clusters froma dataframe

Question

In my data.frame below, I wonder how to subset a whole cluster of study that has any outcome larger than 1 in it?

My desired output is shown below. I tried subset(h, outcome > 1) but that doesn't give my desired output.

h = "
study outcome
a     1
a     2
a     1
b     1
b     1 
c     3
c     3"
h = read.table(text = h,h=T)

DESIRED OUTPUT:
"
study outcome
a     1
a     2
a     1
c     3
c     3"

akrun · Accepted Answer

Modify the subset -

subset the 'study' based on the first logical expression outcome > 1
Use %in% on the 'study' to create the final logical expression in subset

subset(h, study %in% study[outcome > 1])

-output

 study outcome
1     a       1
2     a       2
3     a       1
6     c       3
7     c       3

If we want to limit the number of 'study' elements having 'outcome' value 1, i.e. the first 'n' 'study', then get the unique 'study' from the first expression of subset, use head to get the first 'n' 'study' values and use %in% to create logical expression

n <- 3
subset(h, study %in% head(unique(study[outcome > 1]), n))

Or can be done with a group by approach with any

library(dplyr)
h %>%
    group_by(study) %>%
    filter(any(outcome > 1)) %>%
    ungroup

Subsetting whole clusters froma dataframe

Answers (1)

Related Questions