Reputation: 463
I am trying to create a count table from a data table that looks like this:
df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt"), "Summer" =
c("shorts, skirt", "pants, shoes"), Fall = c("Scarf", "purse, pants"))
Spring Summer Fall
1 skirt, pants, shirt shorts, skirt Scarf
2 tshirt pants, shoes purse, pants
and then a count table that looks like this in the end:
output <- data.frame("Spring" = 4, "Summer" = 4, Fall = 3)
Spring Summer Fall
1 4 4 3
So, I would just like it to count the unique values in a column for each season. I am having trouble with this because of the commas separating values within 1 cell. I tried using length(unique())), but it is not giving me the correct number because of the columns.
Any help is appreciated!!!
Upvotes: 1
Views: 210
Reputation: 39858
One tidyverse
possibility could be:
df %>%
mutate_if(is.factor, as.character) %>%
gather(var, val) %>%
mutate(val = strsplit(val, ", ")) %>%
unnest() %>%
group_by(var) %>%
summarise(val = n_distinct(val))
var val
<chr> <int>
1 Fall 3
2 Spring 4
3 Summer 4
If you want to match the desired output exactly, then you can add spread()
:
df %>%
mutate_if(is.factor, as.character) %>%
gather(var, val) %>%
mutate(val = strsplit(val, ", ")) %>%
unnest() %>%
group_by(var) %>%
summarise(val = n_distinct(val)) %>%
spread(var, val)
Fall Spring Summer
<int> <int> <int>
1 3 4 4
Or using the basic idea from @Sonny (this requires just dplyr
):
df %>%
mutate_if(is.factor, as.character) %>%
summarise_all(list(~ n_distinct(unlist(strsplit(., ", ")))))
Spring Summer Fall
1 4 4 3
Upvotes: 1
Reputation: 3183
Using summarise_all
:
getCount <- function(x) {
x <- as.character(x)
length(unique(unlist(strsplit(x, ","))))
}
library(dplyr)
df %>%
summarise_all(funs(getCount))
Spring Summer Fall
1 4 4 3
Upvotes: 1