Reputation: 842
I have a dataframe which is similar to structure to the one below.
Name | Label |
---|---|
A | historical |
A | comedy |
B | tragedy |
C | comedy |
C | young adult |
I want to combine this into a dataframe, so what all labels with a common name appear in one row. The ideal output would look something like this.
Name | Labels |
---|---|
A | "historical', "comedy" |
B | "tragedy" |
C | "comedy", "young adult" |
So far, I have tried grouping the data together, and then applying a function to the group using group_map()
.
library(tidyverse)
test_df <- data.frame(name = c('A', 'A', 'B', 'C', 'C'),
labels = c('historical', 'comedy', 'tragedy', 'comedy', 'young adult'))
combined_label <- function(dt, ...) {
print(dt[['labels']])
dt['labels'] <- dt[['labels']]
print(dt['labels'])
return(dt)}
test_df <- test_df %>%
group_by(name) %>%
group_map(combined_label)
However, this approach doesn't seem to work. While dt[['labels']]` does give a factor of all the values (e.g. c('historical', 'comedy')), I seem to be having difficulty in combining them together. What I get as my output is:
A tibble: 2 × 1
labels
<fct>
crime
horror
A tibble: 2 × 1
labels
<fct>
comedy
historical
Any help would be greatly appreciated!
Upvotes: 1
Views: 910
Reputation: 887901
Using aggregate
from base R
aggregate(Label ~ Name, d, FUN = toString)
Upvotes: 0
Reputation: 26238
toString()
may also help here, if the output is not required as a listcol
library(dplyr)
d %>% group_by(Name) %>% summarise(Label = toString(Label))
# A tibble: 3 x 2
Name Label
<chr> <chr>
1 A historical, comedy
2 B tragedy
3 C comedy, young adult
Upvotes: 1
Reputation: 10781
Here's a way to do this using dplyr:
library(dplyr)
d %>%
group_by(Name) %>%
summarise(Label1 = list(Label))
Name Label1
<chr> <list>
1 A <chr [2]>
2 B <chr [1]>
3 C <chr [2]>
And another way, using aggregate
:
aggregate(Label ~ Name, data = d, FUN = c)
Name Label
1 A historical, comedy
2 B tragedy
3 C comedy, young adult
d <- structure(list(Name = c("A", "A", "B", "C", "C"),
Label = c("historical", "comedy", "tragedy",
"comedy", "young adult")),
row.names = c(NA, -5L), class = "data.frame")
Upvotes: 3