maia-sh
maia-sh

Reputation: 641

Collapse rows across group and remove duplicates and NAs

I would like to collapse values across rows within a group and remove duplicates and NAs. I have tried several {tidyverse} approaches, including purrr::nest, dplyr::summarize(x = paste(x, collapse = ", ") and dplyr::summarize(x = list(x)`, and haven't gotten it to work. I would be grateful for your help! A reprex of the input and desired output is below.

# Collapse rows across group and remove duplicates and NAs

library(dplyr)

df_in <- tribble(
  ~group, ~subgroup, ~color, ~shape, ~emotion, ~shade,
  1,      "a",       "red",   NA,   "happy",   NA,
  1,      "a",       "red",   NA,   "sad",   "striped"
)

df_in
#> # A tibble: 2 × 6
#>   group subgroup color shape emotion shade  
#>   <dbl> <chr>    <chr> <lgl> <chr>   <chr>  
#> 1     1 a        red   NA    happy   <NA>   
#> 2     1 a        red   NA    sad     striped


df_out <- tribble(
  ~group, ~subgroup, ~color, ~shape, ~emotion,    ~shade,
  1,      "a",       "red",   NA,   "happy, sad", "striped"
)

df_out
#> # A tibble: 1 × 6
#>   group subgroup color shape emotion    shade  
#>   <dbl> <chr>    <chr> <lgl> <chr>      <chr>  
#> 1     1 a        red   NA    happy, sad striped

Created on 2021-11-19 by the reprex package (v2.0.0)

Upvotes: 0

Views: 1375

Answers (1)

Bas
Bas

Reputation: 4658

We can use group_by with summarise(across(everything(), ...)) to apply a function to every column. That function in our case is written as a formula (the ~-notation), in which the column is called .x.

As you suggested, we can paste (with collapse = ", ") the rows together. I remove the NA values with .x[!is.na(.x)].

df_in %>% 
  group_by(group, subgroup) %>% 
  summarise(across(everything(), ~ paste(unique(.x[!is.na(.x)]), collapse = ", "))) %>% 
  ungroup()

The only difference with your expected output is that the shape column is now an empty string instead of an NA value:

# A tibble: 1 x 6
  group subgroup color shape emotion    shade  
  <dbl> <chr>    <chr> <chr> <chr>      <chr>  
1     1 a        red   ""    happy, sad striped

That can be fixed by for example creating a function that replaces the zero-length list with NA before pasting.

paste_rows <- function(x) {
  unique_x <- unique(x[!is.na(x)])
  if (length(unique_x) == 0) {
    unique_x <- NA
  }
  
  paste(unique_x, collapse = ", ")
}

df_in %>% 
  group_by(group, subgroup) %>% 
  summarise(across(everything(), paste_rows)) %>% 
  ungroup()

Upvotes: 2

Related Questions