Reputation: 942

How to filter nested data

How can I filter a nested dataset (make sure the nest is the exact same as some reference vector or tibble)?

library(tidyverse)

rev_vec <-  c("apple", "pear", "banana")

df <- tibble(
  ID= rep(1:3, each =3),
  fruits =  c("apple", "pear", "banana", 
              "Pineapple", "Pineapple", "orange",
              "lime", "pear", NA))

df_vec <- df %>% 
  group_by(ID) %>% 
  summarise(fruits  = list(unique(fruits)))

## This does not work
df_vec %>% 
  filter(fruits == rev_vec)

## This does not work
df_vec %>% 
  filter(unlist(fruits) == rev_vec)

## This does not work
df_vec %>% 
  filter(all(unlist(fruits[[1]]) ==rev_vec))

Basically, I just need to know which ID (in this case 1) matches the reference vector

expected outcome

Only ID 1 matches the rev vec.

df_vec %>%
   filter(....)

# A tibble: 1 x 2
     ID fruits   
  <int> <list>   
1     1 <chr [3]>

Upvotes: 1

Answers (3)

Sergey Skripko

Reputation: 374

df_vec %>% 
    filter(map_lgl(fruits, ~setequal(., rev_vec)))

# A tibble: 1 x 2
     ID fruits   
  <int> <list>   
1     1 <chr [3]>

Upvotes: 1

Ben

Reputation: 30474

Perhaps you could try using identical to see if the fruits for each ID are exactly identical to the reference vector.

library(tidyverse)

df %>%
  group_by(ID) %>%
  filter(identical(fruits, rev_vec))

Output

     ID fruits
  <int> <chr> 
1     1 apple 
2     1 pear  
3     1 banana

Upvotes: 0

Sotos

Reputation: 51592

Not sure how you want the output structured, but here is an idea

library(dplyr)

df %>% 
 group_by(ID) %>% 
 mutate(new = sum(fruits %in% rev_vec) == n())

# A tibble: 9 x 3
# Groups:   ID [3]
     ID fruits    new  
  <int> <chr>     <lgl>
1     1 apple     TRUE 
2     1 pear      TRUE 
3     1 banana    TRUE 
4     2 Pineapple FALSE
5     2 Pineapple FALSE
6     2 orange    FALSE
7     3 lime      FALSE
8     3 pear      FALSE
9     3 NA        FALSE

Another output,

df %>% 
 group_by(ID) %>% 
 mutate(new = sum(fruits %in% rev_vec) == n()) %>% 
 filter(new) %>% 
 nest()

# A tibble: 1 x 2
# Groups:   ID [1]
     ID data            
  <int> <list>          
1     1 <tibble [3 x 2]>

Upvotes: 0

How to filter nested data

expected outcome

Answers (3)

Related Questions