combine duplicates and keep unique elements using dplyr::distinct

Question

I'd like to combine rows, delete dups, but keep uniques. I can do the first two, but not the second. I have a df called subject which looks like this:

unique_id   subject     grade
1            Math         88
1            English      78
1            History      98
2            Math         65
2            English      72
2            History      84

This code

combined <- distinct(subject, unique_id, .keep_all = TRUE)

gives me this output:

   unique_id    subject     grade
1               Math          88
2               Math          65

I'd like to have it look like this:

   unique_id    subject                       grade
1               Math, English, History        88, 78, 98
2               Math, English, History        65, 72, 84

The solution below works perfectly! Here is one more wrinkle to the problem.

If I have a data set that looks like this:

unique_id   school  subject  grade  sex
    1       great   Math      88    
    1       great   English   78    
    1       great   History   98    male
    2       spring  Math      65    
    2       spring  English   72    female
    2       spring  History   84

and run the following code:

(r2 <- df %>%
  group_by(unique_id) %>% 
  summarise_each(funs(toString(unique(.)))))

I get the following output:

unique_id   school  subject                     grade       sex
    1       great   Math, English, History      88,78,98     , male 
    2       spring  English, English, History   65,72,84     , female

Is there a way to have it combine empty cells with out the commas in sex category, so it can look like this:

unique_id   school  subject                     grade       sex
    1       great   Math, English, History      88,78,98     male   
    2       spring  English, English, History   65,72,84     female

Thank you!

akrun · Accepted Answer

We can do this by pasting (toString is a wrapper for paste(., collapse=", ")) the unique elements together for each column after grouping by 'unique_id'

library(dplyr)
(r1 <- df %>%
        group_by(unique_id) %>% 
        summarise_each(funs(toString(unique(.)))))
#   unique_id                subject      grade
#                               
#1         1 Math, English, History 88, 78, 98
#2         2 Math, English, History 65, 72, 84

Or we can keep it as a list and then extract the list elements

r2 <- df %>% 
       group_by(unique_id) %>% 
       summarise_each(funs(list(unique(.))))

From 'r2', we can unnest (from tidyr) if we needed

library(tidyr)
r2 %>%
  unnest()

If we are using the OP's way with distinct, we can remove the unique from the summarise_each.

df %>%
   distinct(subject, unique_id, .keep_all=TRUE) %>%
   group_by(unique_id) %>%
   summarise_each(funs(toString(.)))

combine duplicates and keep unique elements using dplyr::distinct

Answers (1)

Related Questions