Reputation: 641
I would like to collapse values across rows within a group and remove duplicates and NAs. I have tried several {tidyverse}
approaches, including purrr::nest
, dplyr::summarize(x = paste(x, collapse = ", ") and
dplyr::summarize(x = list(x)`, and haven't gotten it to work. I would be grateful for your help! A reprex of the input and desired output is below.
# Collapse rows across group and remove duplicates and NAs
library(dplyr)
df_in <- tribble(
~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
1, "a", "red", NA, "happy", NA,
1, "a", "red", NA, "sad", "striped"
)
df_in
#> # A tibble: 2 × 6
#> group subgroup color shape emotion shade
#> <dbl> <chr> <chr> <lgl> <chr> <chr>
#> 1 1 a red NA happy <NA>
#> 2 1 a red NA sad striped
df_out <- tribble(
~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
1, "a", "red", NA, "happy, sad", "striped"
)
df_out
#> # A tibble: 1 × 6
#> group subgroup color shape emotion shade
#> <dbl> <chr> <chr> <lgl> <chr> <chr>
#> 1 1 a red NA happy, sad striped
Created on 2021-11-19 by the reprex package (v2.0.0)
Upvotes: 0
Views: 1375
Reputation: 4658
We can use group_by
with summarise(across(everything(), ...))
to apply a function to every column. That function in our case is written as a formula (the ~
-notation), in which the column is called .x
.
As you suggested, we can paste
(with collapse = ", "
) the rows together. I remove the NA
values with .x[!is.na(.x)]
.
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), ~ paste(unique(.x[!is.na(.x)]), collapse = ", "))) %>%
ungroup()
The only difference with your expected output is that the shape
column is now an empty string instead of an NA
value:
# A tibble: 1 x 6
group subgroup color shape emotion shade
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 a red "" happy, sad striped
That can be fixed by for example creating a function that replaces the zero-length list with NA
before pasting.
paste_rows <- function(x) {
unique_x <- unique(x[!is.na(x)])
if (length(unique_x) == 0) {
unique_x <- NA
}
paste(unique_x, collapse = ", ")
}
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), paste_rows)) %>%
ungroup()
Upvotes: 2