Using the unite function in R and removing duplicated values

Question

I'm trying to use the unite function in R to concatenate values across columns, but also deduplicate the values. How can I accomplish this?

Here is the input data:

input <- tibble(
  id = c('aa', 'ss', 'dd', 'qq'),
  '2017' = c('tv', NA, NA, 'web'),
  '2018' = c('tv', 'web', NA, NA),
  '2019' = c(NA, 'web', 'book', 'tv')
)

# A tibble: 4 x 4
  id    `2017` `2018` `2019`
        
1 aa    tv     tv     NA    
2 ss    NA     web    web    
3 dd    NA     NA     book  
4 qq    web    NA     tv

The desired output with the ALL column is:

> output
# A tibble: 4 x 5
  id    `2017` `2018` `2019` ALL   
          
1 aa    tv     tv     NA     tv    
2 ss    NA     web    web    web   
3 dd    NA     NA     book   book  
4 qq    web    NA     tv     web, tv

Ronak Shah · Accepted Answer

I am not sure if deduplicating is possible with unite, however you can use apply row-wise.

input$ALL <- apply(input[-1], 1, function(x) toString(na.omit(unique(x))))

Or a tidyverse way could be using pmap

library(tidyverse)

input %>%
  mutate(ALL = pmap_chr(select(., -id), ~toString(unique(na.omit(c(...))))))

#  id    `2017` `2018` `2019` ALL    
#           
#1 aa    tv     tv     NA     tv     
#2 ss    NA     web    web    web    
#3 dd    NA     NA     book   book   
#4 qq    web    NA     tv     web, tv

Or getting the data in long format and then joining

input %>%
  pivot_longer(cols = -id, values_drop_na = TRUE) %>%
  group_by(id) %>%
  summarise(ALL = toString(unique(value))) %>%
  left_join(input)

Using the unite function in R and removing duplicated values

Answers (2)

Related Questions