Reputation: 942
How can I filter a nested dataset (make sure the nest is the exact same as some reference vector or tibble)?
library(tidyverse)
rev_vec <- c("apple", "pear", "banana")
df <- tibble(
ID= rep(1:3, each =3),
fruits = c("apple", "pear", "banana",
"Pineapple", "Pineapple", "orange",
"lime", "pear", NA))
df_vec <- df %>%
group_by(ID) %>%
summarise(fruits = list(unique(fruits)))
## This does not work
df_vec %>%
filter(fruits == rev_vec)
## This does not work
df_vec %>%
filter(unlist(fruits) == rev_vec)
## This does not work
df_vec %>%
filter(all(unlist(fruits[[1]]) ==rev_vec))
Basically, I just need to know which ID (in this case 1) matches the reference vector
Only ID 1 matches the rev vec.
df_vec %>%
filter(....)
# A tibble: 1 x 2
ID fruits
<int> <list>
1 1 <chr [3]>
Upvotes: 1
Views: 300
Reputation: 374
df_vec %>%
filter(map_lgl(fruits, ~setequal(., rev_vec)))
# A tibble: 1 x 2
ID fruits
<int> <list>
1 1 <chr [3]>
Upvotes: 1
Reputation: 30474
Perhaps you could try using identical
to see if the fruits
for each ID
are exactly identical to the reference vector.
library(tidyverse)
df %>%
group_by(ID) %>%
filter(identical(fruits, rev_vec))
Output
ID fruits
<int> <chr>
1 1 apple
2 1 pear
3 1 banana
Upvotes: 0
Reputation: 51592
Not sure how you want the output structured, but here is an idea
library(dplyr)
df %>%
group_by(ID) %>%
mutate(new = sum(fruits %in% rev_vec) == n())
# A tibble: 9 x 3
# Groups: ID [3]
ID fruits new
<int> <chr> <lgl>
1 1 apple TRUE
2 1 pear TRUE
3 1 banana TRUE
4 2 Pineapple FALSE
5 2 Pineapple FALSE
6 2 orange FALSE
7 3 lime FALSE
8 3 pear FALSE
9 3 NA FALSE
Another output,
df %>%
group_by(ID) %>%
mutate(new = sum(fruits %in% rev_vec) == n()) %>%
filter(new) %>%
nest()
# A tibble: 1 x 2
# Groups: ID [1]
ID data
<int> <list>
1 1 <tibble [3 x 2]>
Upvotes: 0