captcoma
captcoma

Reputation: 1908

Filter by subgroup criteria (specify the occurrence of a value per group) using dplyr

I would like to filter a dataset and keep all groups that have exactly n rows (in my case 1 row) with a specific item.

df <- tibble(group=c("a","a","a","b","b","b"),
        item=c(1,2,2,1,1,3))

I know how to filter all groups with at least 1x 1item using any

df %>% group_by(group) %>% 
  filter(any(item==1))

However, I do not know if it is possible to specify the occurrence per group. I thought about something like this:

filter(n(item==1)==1)
filter(any(item==1,1))

Upvotes: 0

Views: 383

Answers (2)

akrun
akrun

Reputation: 887901

We can use data.table by directly subsetting

library(data.table)
n <- 1
setDT(df)[, .SD[sum(item == 1) >= n], by = group]

Or using length

library(dplyr)
df %>%
   group_by(group) %>% 
   filter(length(item[item==1]) >= n)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389275

We could group_by group and calculate occurrence of item == 1 in each group and filter where there are >= n occurrences.

library(dplyr)
n <- 1

df %>%
  group_by(group) %>%
  filter(sum(item == 1) >= n)

Or using the same logic with base R ave

df[with(df, ave(item == 1, group, FUN = sum) >= n), ]

and for completion one with data.table

library(data.table)
setDT(df)[, if(sum(item == 1) >= n) .SD, by = group]

Upvotes: 2

Related Questions