dshgna
dshgna

Reputation: 842

R : Combine categorical row values in a group to a single value

I have a dataframe which is similar to structure to the one below.

Name Label
A historical
A comedy
B tragedy
C comedy
C young adult

I want to combine this into a dataframe, so what all labels with a common name appear in one row. The ideal output would look something like this.

Name Labels
A "historical', "comedy"
B "tragedy"
C "comedy", "young adult"

So far, I have tried grouping the data together, and then applying a function to the group using group_map().

library(tidyverse)

test_df <- data.frame(name = c('A', 'A', 'B', 'C', 'C'), 
                      labels = c('historical', 'comedy', 'tragedy', 'comedy', 'young adult'))

combined_label <- function(dt, ...) {
    print(dt[['labels']]) 
    dt['labels'] <- dt[['labels']]
    print(dt['labels']) 
    return(dt)}

test_df <- test_df %>%
    group_by(name) %>%
    group_map(combined_label)

However, this approach doesn't seem to work. While dt[['labels']]` does give a factor of all the values (e.g. c('historical', 'comedy')), I seem to be having difficulty in combining them together. What I get as my output is:

A tibble: 2 × 1
labels
<fct>
crime
horror
A tibble: 2 × 1
labels
<fct>
comedy
historical

Any help would be greatly appreciated!

Upvotes: 1

Views: 910

Answers (3)

akrun
akrun

Reputation: 887901

Using aggregate from base R

aggregate(Label ~ Name, d, FUN = toString)

Upvotes: 0

AnilGoyal
AnilGoyal

Reputation: 26238

toString() may also help here, if the output is not required as a listcol

library(dplyr)

d %>% group_by(Name) %>% summarise(Label = toString(Label))

# A tibble: 3 x 2
  Name  Label              
  <chr> <chr>              
1 A     historical, comedy 
2 B     tragedy            
3 C     comedy, young adult

Upvotes: 1

bouncyball
bouncyball

Reputation: 10781

Here's a way to do this using dplyr:

library(dplyr) 

d %>%
    group_by(Name) %>%
    summarise(Label1 = list(Label)) 

  Name  Label1   
  <chr> <list>   
1 A     <chr [2]>
2 B     <chr [1]>
3 C     <chr [2]>

And another way, using aggregate:

aggregate(Label ~ Name, data = d, FUN = c)

  Name               Label
1    A  historical, comedy
2    B             tragedy
3    C comedy, young adult

Data

d <- structure(list(Name = c("A", "A", "B", "C", "C"), 
                    Label = c("historical", "comedy", "tragedy", 
                              "comedy", "young adult")), 
               row.names = c(NA, -5L), class = "data.frame")

Upvotes: 3

Related Questions