Creating a count table of unique values when there are multiple values in a single cell using R

Question

I am trying to create a count table from a data table that looks like this:

df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt"), "Summer" = 
c("shorts, skirt", "pants, shoes"), Fall = c("Scarf", "purse, pants"))

               Spring        Summer         Fall
1 skirt, pants, shirt shorts, skirt        Scarf
2              tshirt  pants, shoes purse, pants

and then a count table that looks like this in the end:

output <- data.frame("Spring" = 4, "Summer" = 4, Fall = 3)

  Spring Summer Fall
1      4      4    3

So, I would just like it to count the unique values in a column for each season. I am having trouble with this because of the commas separating values within 1 cell. I tried using length(unique())), but it is not giving me the correct number because of the columns.

Any help is appreciated!!!

tmfmnk · Accepted Answer

One tidyverse possibility could be:

df %>%
 mutate_if(is.factor, as.character) %>%
 gather(var, val) %>%
 mutate(val = strsplit(val, ", ")) %>%
 unnest() %>%
 group_by(var) %>%
 summarise(val = n_distinct(val))

  var      val
    
1 Fall       3
2 Spring     4
3 Summer     4

If you want to match the desired output exactly, then you can add spread():

df %>%
 mutate_if(is.factor, as.character) %>%
 gather(var, val) %>%
 mutate(val = strsplit(val, ", ")) %>%
 unnest() %>%
 group_by(var) %>%
 summarise(val = n_distinct(val)) %>%
 spread(var, val)

   Fall Spring Summer
      
1     3      4      4

Or using the basic idea from @Sonny (this requires just dplyr):

df %>%
 mutate_if(is.factor, as.character) %>%
 summarise_all(list(~ n_distinct(unlist(strsplit(., ", ")))))

  Spring Summer Fall
1      4      4    3

Creating a count table of unique values when there are multiple values in a single cell using R

Answers (2)

Related Questions