Reputation: 21
I have a dataframe that looks like this:
df <- data.frame(id=c("list1", "list2"))
df$Content <- list(c("A", "B", "C"), c("A", "B", "A"))
For each row in "Content", I would like to first Remove duplicates, then find all rows containing certain elements, for example "A", and it would return both row 1 and 2.
I've tried using duplicate() with apply() but it seems to be finding duplicates on the list level, as in, does c("A", "B", "C") match c("A", "B", "A") instead of finding duplicates within each list.
Similarly, I'm having trouble identifying the presence of a specific element in the list, instead of trying to match things to the list as a whole.
The only thing I could think of is using a for loop, but I was wondering if there's a more elegant way to do this.
Upvotes: 1
Views: 39
Reputation: 887058
We can use map
to loop over the list
elements, return the unique
elements, then filter
the rows of the dataset where there is 'A' in the 'Content'
library(dplyr)
library(purrr)
df %>%
mutate(Content = map(Content, unique)) %>%
filter(map_lgl(Content, ~ 'A' %in% .x))
# id Content
#1 list1 A, B, C
#2 list2 A, B
Or another option is to unnest
the list
column, do a group_by
filter
on the distinct
rows and then condense
(from devel
version of dplyr
) or summarise
into a list
column
df %>%
unnest(c(Content)) %>%
distinct() %>%
group_by(id) %>%
filter('A' %in% Content) %>%
condense(Content)
# A tibble: 2 x 2
# Rowwise: id
# id Content
# <fct> <list>
#1 list1 <chr [3]>
2 list2
Upvotes: 1