daily
daily

Reputation: 215

R: Filter a Dataframe with a list on matching values of a list-column

I'm new to R and have a Problem I just don't seem to get solved..

I have a table like this:

issue | teams
-------------
A     | team-A,team-B
B     | team-C
C     | team-A
D     | team-A,team-B
E     | team-B

The column teams is of the type "list" Now I have another "list" with teams the current user is in. For example "team-B,team-C". What I want to achive is to filter the dataframe that it only shows rows where the teams the current user is in have at least on match in the teams-column. In this examle the output should be:

issue | teams
-------------
A     | team-A,team-B
B     | team-C
D     | team-A,team-B
E     | team-B

I tried with intersect which gives me on a length-check a boolean ("length(intersect(my_teams,teams)) != 0"). But if I use this in a dplyr-Filter it doesn't work. I think I didn't understand the data-structures with lists, vectors, dataframes and so on fully and their different behaviour in dataframes. As I said I'm just starting with R. Could someone give me a hand or hint how I can solve this filtering? I'm sure it's pretty easy, but I'm out of google-Skills ^^

Thanks in Advance!

Minimal Example:

library(tidyverse)

issue <- c("A","A","B","C","D","D","E")
team <- c("team-A","team-B","team-C","team-A","team-A","team-B","team-B")

df <- data.frame(issue,team) %>% 
  group_by(issue) %>%
  summarise(teams = list(team))

glimpse(df)
dput(df)

Output:

> library(tidyverse)

> issue <- c("A","A","B","C","D","D","E")

> team <- c("team-A","team-B","team-C","team-A","team-A","team-B","team-B")

> df <- data.frame(issue,team,row.names = c(1,2,3,4,5,6,7)) %>% 
+   group_by(issue) %>%
+   summarise(teams = list(team))

> glimpse(df)
Rows: 5
Columns: 2
$ issue <chr> "A", "B", "C", "D", "E"
$ teams <list> <"team-A", "team-B">, "team-C", "team-A", <"team-A", "team-B">, "team-B"

> dput(df)
structure(list(issue = c("A", "B", "C", "D", "E"), teams = list(
    c("team-A", "team-B"), "team-C", "team-A", c("team-A", "team-B"
    ), "team-B")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L))

Session-Info:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.5     readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.4.0  
[9] tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] pillar_1.8.1        compiler_4.2.2      cellranger_1.1.0    dbplyr_2.2.1        tools_4.2.2         timechange_0.1.1    lubridate_1.9.0    
 [8] jsonlite_1.8.3      googledrive_2.0.0   lifecycle_1.0.3     gargle_1.2.1        gtable_0.3.1        pkgconfig_2.0.3     rlang_1.0.6        
[15] reprex_2.0.2        DBI_1.1.3           cli_3.4.1           rstudioapi_0.14     haven_2.5.1         xml2_1.3.3          withr_2.5.0        
[22] httr_1.4.4          generics_0.1.3      vctrs_0.5.1         fs_1.5.2            hms_1.1.2           googlesheets4_1.0.1 grid_4.2.2         
[29] tidyselect_1.2.0    glue_1.6.2          R6_2.5.1            fansi_1.0.3         readxl_1.4.1        tzdb_0.3.0          modelr_0.1.10      
[36] magrittr_2.0.3      ellipsis_0.3.2      backports_1.4.1     scales_1.2.1        rvest_1.0.3         assertthat_0.2.1    colorspace_2.0-3   
[43] utf8_1.2.2          stringi_1.7.8       munsell_0.5.0       broom_1.0.1         crayon_1.5.2

Upvotes: 1

Views: 2837

Answers (1)

akrun
akrun

Reputation: 887108

If we have a list column, we can loop over the list with sapply to create a logical vector for subsetting

v1 <- c("team-B", "team-C")
subset(df1, sapply(teams, \(x) any(v1 %in% x)))

-output

 issue          teams
1     A team-A, team-B
2     B         team-C
4     D team-A, team-B
5     E         team-B

Or using tidyverse

library(dplyr)
library(purrr)
df1 %>% 
  filter(map_lgl(teams, ~ any(v1 %in% .x)))
  issue          teams
1     A team-A, team-B
2     B         team-C
3     D team-A, team-B
4     E         team-B

data

df1 <- structure(list(issue = c("A", "B", "C", "D", "E"), teams = list(
    c("team-A", "team-B"), "team-C", "team-A", c("team-A", "team-B"
    ), "team-B")), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 3

Related Questions