Reputation: 215
I'm new to R and have a Problem I just don't seem to get solved..
I have a table like this:
issue | teams
-------------
A | team-A,team-B
B | team-C
C | team-A
D | team-A,team-B
E | team-B
The column teams is of the type "list" Now I have another "list" with teams the current user is in. For example "team-B,team-C". What I want to achive is to filter the dataframe that it only shows rows where the teams the current user is in have at least on match in the teams-column. In this examle the output should be:
issue | teams
-------------
A | team-A,team-B
B | team-C
D | team-A,team-B
E | team-B
I tried with intersect which gives me on a length-check a boolean ("length(intersect(my_teams,teams)) != 0"). But if I use this in a dplyr-Filter it doesn't work. I think I didn't understand the data-structures with lists, vectors, dataframes and so on fully and their different behaviour in dataframes. As I said I'm just starting with R. Could someone give me a hand or hint how I can solve this filtering? I'm sure it's pretty easy, but I'm out of google-Skills ^^
Thanks in Advance!
Minimal Example:
library(tidyverse)
issue <- c("A","A","B","C","D","D","E")
team <- c("team-A","team-B","team-C","team-A","team-A","team-B","team-B")
df <- data.frame(issue,team) %>%
group_by(issue) %>%
summarise(teams = list(team))
glimpse(df)
dput(df)
Output:
> library(tidyverse)
> issue <- c("A","A","B","C","D","D","E")
> team <- c("team-A","team-B","team-C","team-A","team-A","team-B","team-B")
> df <- data.frame(issue,team,row.names = c(1,2,3,4,5,6,7)) %>%
+ group_by(issue) %>%
+ summarise(teams = list(team))
> glimpse(df)
Rows: 5
Columns: 2
$ issue <chr> "A", "B", "C", "D", "E"
$ teams <list> <"team-A", "team-B">, "team-C", "team-A", <"team-A", "team-B">, "team-B"
> dput(df)
structure(list(issue = c("A", "B", "C", "D", "E"), teams = list(
c("team-A", "team-B"), "team-C", "team-A", c("team-A", "team-B"
), "team-B")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L))
Session-Info:
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0
[9] tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] pillar_1.8.1 compiler_4.2.2 cellranger_1.1.0 dbplyr_2.2.1 tools_4.2.2 timechange_0.1.1 lubridate_1.9.0
[8] jsonlite_1.8.3 googledrive_2.0.0 lifecycle_1.0.3 gargle_1.2.1 gtable_0.3.1 pkgconfig_2.0.3 rlang_1.0.6
[15] reprex_2.0.2 DBI_1.1.3 cli_3.4.1 rstudioapi_0.14 haven_2.5.1 xml2_1.3.3 withr_2.5.0
[22] httr_1.4.4 generics_0.1.3 vctrs_0.5.1 fs_1.5.2 hms_1.1.2 googlesheets4_1.0.1 grid_4.2.2
[29] tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.3 readxl_1.4.1 tzdb_0.3.0 modelr_0.1.10
[36] magrittr_2.0.3 ellipsis_0.3.2 backports_1.4.1 scales_1.2.1 rvest_1.0.3 assertthat_0.2.1 colorspace_2.0-3
[43] utf8_1.2.2 stringi_1.7.8 munsell_0.5.0 broom_1.0.1 crayon_1.5.2
Upvotes: 1
Views: 2837
Reputation: 887108
If we have a list
column, we can loop over the list
with sapply
to create a logical vector
for subset
ting
v1 <- c("team-B", "team-C")
subset(df1, sapply(teams, \(x) any(v1 %in% x)))
-output
issue teams
1 A team-A, team-B
2 B team-C
4 D team-A, team-B
5 E team-B
Or using tidyverse
library(dplyr)
library(purrr)
df1 %>%
filter(map_lgl(teams, ~ any(v1 %in% .x)))
issue teams
1 A team-A, team-B
2 B team-C
3 D team-A, team-B
4 E team-B
df1 <- structure(list(issue = c("A", "B", "C", "D", "E"), teams = list(
c("team-A", "team-B"), "team-C", "team-A", c("team-A", "team-B"
), "team-B")), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 3