rnorouzian
rnorouzian

Reputation: 7517

Subsetting whole clusters froma dataframe

In my data.frame below, I wonder how to subset a whole cluster of study that has any outcome larger than 1 in it?

My desired output is shown below. I tried subset(h, outcome > 1) but that doesn't give my desired output.

h = "
study outcome
a     1
a     2
a     1
b     1
b     1 
c     3
c     3"
h = read.table(text = h,h=T)

DESIRED OUTPUT:
"
study outcome
a     1
a     2
a     1
c     3
c     3"

Upvotes: 1

Views: 35

Answers (1)

akrun
akrun

Reputation: 887158

Modify the subset -

  1. subset the 'study' based on the first logical expression outcome > 1
  2. Use %in% on the 'study' to create the final logical expression in subset
subset(h, study %in% study[outcome > 1])

-output

 study outcome
1     a       1
2     a       2
3     a       1
6     c       3
7     c       3

If we want to limit the number of 'study' elements having 'outcome' value 1, i.e. the first 'n' 'study', then get the unique 'study' from the first expression of subset, use head to get the first 'n' 'study' values and use %in% to create logical expression

n <- 3
subset(h, study %in% head(unique(study[outcome > 1]), n))

Or can be done with a group by approach with any

library(dplyr)
h %>%
    group_by(study) %>%
    filter(any(outcome > 1)) %>%
    ungroup

Upvotes: 1

Related Questions