otwtm
otwtm

Reputation: 1999

Filter dataframe by values being subset of a given set

Let's say we have some persons that do different activities.

data <- data.frame(person=c('A','A','A','B','B','B','C','C'), activity=c(1,2,3,1,2,3,1,2))

I would like to filter the data by persons that only do 'relevant activities' where relevant activities are defined by another vector.

relevant_activities <- c(1,2)

Hence, a person's activity values need to be a subset of the relevant activities.

Expected outcome:

  person activity
1      C        1
2      C        2

I tried something like this, without success:

library(dplyr)
data %>%
  group_by(person) %>%
  filter(all(relevant_activities %in% activity))

Upvotes: 1

Views: 76

Answers (2)

jogo
jogo

Reputation: 12559

Here is a solution with data.table

library("data.table")

D <- data.table(person=c('A','A','A','B','B','B','C','C'), activity=c(1,2,3,1,2,3,1,2))
relevant_activities <- c(1,2)

D[person %in% D[, all(activity %in% relevant_activities), person][, person[V1]]]

or with a key on the datatable:

D <- data.table(person=c('A','A','A','B','B','B','C','C'),
        activity=c(1,2,3,1,2,3,1,2), key="person")
relevant_activities <- c(1,2)

D[D[, all(activity %in% relevant_activities), person][, person[V1]]]

Upvotes: 0

akrun
akrun

Reputation: 887078

We can wrap with all

data %>% 
     group_by(person) %>% 
     filter(all(activity %in% relevant_activities))

Upvotes: 1

Related Questions