Reputation: 348
I'd like to combine rows, delete dups, but keep uniques. I can do the first two, but not the second. I have a df called subject which looks like this:
unique_id subject grade
1 Math 88
1 English 78
1 History 98
2 Math 65
2 English 72
2 History 84
This code
combined <- distinct(subject, unique_id, .keep_all = TRUE)
gives me this output:
unique_id subject grade
1 Math 88
2 Math 65
I'd like to have it look like this:
unique_id subject grade
1 Math, English, History 88, 78, 98
2 Math, English, History 65, 72, 84
The solution below works perfectly! Here is one more wrinkle to the problem.
If I have a data set that looks like this:
unique_id school subject grade sex
1 great Math 88
1 great English 78
1 great History 98 male
2 spring Math 65
2 spring English 72 female
2 spring History 84
and run the following code:
(r2 <- df %>%
group_by(unique_id) %>%
summarise_each(funs(toString(unique(.)))))
I get the following output:
unique_id school subject grade sex
1 great Math, English, History 88,78,98 , male
2 spring English, English, History 65,72,84 , female
Is there a way to have it combine empty cells with out the commas in sex category, so it can look like this:
unique_id school subject grade sex
1 great Math, English, History 88,78,98 male
2 spring English, English, History 65,72,84 female
Thank you!
Upvotes: 1
Views: 130
Reputation: 887531
We can do this by pasting (toString
is a wrapper for paste(., collapse=", ")
) the unique elements together for each column after grouping by 'unique_id'
library(dplyr)
(r1 <- df %>%
group_by(unique_id) %>%
summarise_each(funs(toString(unique(.)))))
# unique_id subject grade
# <int> <chr> <chr>
#1 1 Math, English, History 88, 78, 98
#2 2 Math, English, History 65, 72, 84
Or we can keep it as a list
and then extract the list
elements
r2 <- df %>%
group_by(unique_id) %>%
summarise_each(funs(list(unique(.))))
From 'r2', we can unnest
(from tidyr
) if we needed
library(tidyr)
r2 %>%
unnest()
If we are using the OP's way with distinct
, we can remove the unique
from the summarise_each
.
df %>%
distinct(subject, unique_id, .keep_all=TRUE) %>%
group_by(unique_id) %>%
summarise_each(funs(toString(.)))
Upvotes: 2