Reputation: 13852

Filter a dataframe with dplyr

I have this data.frame:

df <- data.frame(
    id = c("x1", "x2", "x3", "x4", "x5", "x1", "x2", "x6", "x7", "x8", "x7", "x8" ), 
    age = c(rep("juvenile", 5), rep("adult", 7))
    )
df 

   id      age
1  x1 juvenile
2  x2 juvenile
3  x3 juvenile
4  x4 juvenile
5  x5 juvenile
6  x1    adult
7  x2    adult
8  x6    adult
9  x7    adult
10 x8    adult
11 x7    adult
12 x8    adult

Each row represents an individual. I want to pull out all rows where juveniles were seen again as adults. I do not want rows where individuals originally seen a adults were seen again as adults (ids x7 and x8). So the resultant data.frame should be this:

  id      age
1 x1 juvenile
2 x2 juvenile
3 x1    adult
4 x2    adult

I'm specifically after a dplyr solution.

Upvotes: 3

Answers (3)

alex23lemm

Reputation: 5675

Here is a etc. solution using dplyr which might become useful when looking for more specific thresholds:

df %>% 
  group_by(id) %>% 
  filter(sum(age == 'juvenile') >= 1 & sum(age == 'adult') >= 1)

# Source: local data frame [4 x 2]
# Groups: id
# 
# id      age
# 1 x1 juvenile
# 2 x2 juvenile
# 3 x1    adult
# 4 x2    adult

Upvotes: 4

colemand77

Reputation: 551

Hey I think this is what you're looking for... broke it down for exposition, but I'm sure you can make it a little more compact by not re-assigning the results of the filter arguments.

kids <- df %>%
  filter(age == "juvenile")

adults <- df %>%
  filter(age == "adult")

repeat_offender<-inner_join(kids,adults, by = "id")
repeat_offender

to actually return the answer as requested...

this_solution_sucks<-gather(repeat_offender, agex, age, -id) %>% select(-agex)

Upvotes: 2

Marat Talipov

Reputation: 13314

You can group by id and select only those groups that contain both 'juvenile' and 'adult':

df %>% 
   group_by(id) %>% 
   filter(all(c('juvenile','adult') %in% age))

#Source: local data frame [4 x 2]
#Groups: id
#
#  id      age
#1 x1 juvenile
#2 x2 juvenile
#3 x1    adult
#4 x2    adult

Upvotes: 6

Filter a dataframe with dplyr

Answers (3)

Related Questions