Manipulate each Row of a Dataframe containing Lists in R

Question

I have a dataframe that looks like this:

df <- data.frame(id=c("list1", "list2"))
df$Content <- list(c("A", "B", "C"), c("A", "B", "A"))

For each row in "Content", I would like to first Remove duplicates, then find all rows containing certain elements, for example "A", and it would return both row 1 and 2.

I've tried using duplicate() with apply() but it seems to be finding duplicates on the list level, as in, does c("A", "B", "C") match c("A", "B", "A") instead of finding duplicates within each list.

Similarly, I'm having trouble identifying the presence of a specific element in the list, instead of trying to match things to the list as a whole.

The only thing I could think of is using a for loop, but I was wondering if there's a more elegant way to do this.

akrun · Accepted Answer

We can use map to loop over the list elements, return the unique elements, then filter the rows of the dataset where there is 'A' in the 'Content'

library(dplyr)
library(purrr)
df %>%
   mutate(Content  = map(Content, unique)) %>%
   filter(map_lgl(Content, ~ 'A' %in% .x))
#    id Content
#1 list1 A, B, C
#2 list2    A, B

Or another option is to unnest the list column, do a group_by filter on the distinct rows and then condense (from devel version of dplyr) or summarise into a list column

df %>%
    unnest(c(Content)) %>% 
    distinct() %>% 
    group_by(id) %>% 
    filter('A' %in% Content) %>%
    condense(Content)
# A tibble: 2 x 2
# Rowwise:  id
#  id    Content  
#       
#1 list1

2 list2

Manipulate each Row of a Dataframe containing Lists in R

Answers (1)

Related Questions