Sarah
Sarah

Reputation: 463

Creating a count table of unique values when there are multiple values in a single cell using R

I am trying to create a count table from a data table that looks like this:

df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt"), "Summer" = 
c("shorts, skirt", "pants, shoes"), Fall = c("Scarf", "purse, pants"))

               Spring        Summer         Fall
1 skirt, pants, shirt shorts, skirt        Scarf
2              tshirt  pants, shoes purse, pants

and then a count table that looks like this in the end:

output <- data.frame("Spring" = 4, "Summer" = 4, Fall = 3)

  Spring Summer Fall
1      4      4    3

So, I would just like it to count the unique values in a column for each season. I am having trouble with this because of the commas separating values within 1 cell. I tried using length(unique())), but it is not giving me the correct number because of the columns.

Any help is appreciated!!!

Upvotes: 1

Views: 210

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

One tidyverse possibility could be:

df %>%
 mutate_if(is.factor, as.character) %>%
 gather(var, val) %>%
 mutate(val = strsplit(val, ", ")) %>%
 unnest() %>%
 group_by(var) %>%
 summarise(val = n_distinct(val))

  var      val
  <chr>  <int>
1 Fall       3
2 Spring     4
3 Summer     4

If you want to match the desired output exactly, then you can add spread():

df %>%
 mutate_if(is.factor, as.character) %>%
 gather(var, val) %>%
 mutate(val = strsplit(val, ", ")) %>%
 unnest() %>%
 group_by(var) %>%
 summarise(val = n_distinct(val)) %>%
 spread(var, val)

   Fall Spring Summer
  <int>  <int>  <int>
1     3      4      4

Or using the basic idea from @Sonny (this requires just dplyr):

df %>%
 mutate_if(is.factor, as.character) %>%
 summarise_all(list(~ n_distinct(unlist(strsplit(., ", ")))))

  Spring Summer Fall
1      4      4    3

Upvotes: 1

Sonny
Sonny

Reputation: 3183

Using summarise_all:

getCount <- function(x) {
  x <- as.character(x)
  length(unique(unlist(strsplit(x, ","))))
}

library(dplyr)
df %>%
  summarise_all(funs(getCount))
  Spring Summer Fall
1      4      4    3

Upvotes: 1

Related Questions