Reputation: 742
I'd like to make a frequency table like this in R:
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,9,8,7))
apply(df, 2, table)
# outputs:
# aa bb cc
# 7 1 1 2
# 8 2 1 1
# 9 1 2 1
But, if one of the columns of df
would have a count of 0 (e.g. if we change the above so that df$cc
has no 9
) we'll get a list instead of a nice dataframe.
# example that gives a list
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,8,8,7))
apply(df, 2, table)
What's a simple way do something similar that will guarantee dataframe output regardless of the counts?
I can imagine a number of solutions that seem messy or hacked, for example, this produces the desired result:
# example of a messy but correct solution
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,8,8,7))
apply(df, 2, function(x) summary(factor(x, levels = unique(unlist(df)))))
Is there a cleaner way to do this?
Upvotes: 1
Views: 483
Reputation: 146144
I'll go ahead and answer, though I still object to the lack of criteria. If we think of "tidy" as the opposite of "messy", then we should first tidy the input data into a long format. Then we can do a two-way table
:
library(tidyr)
df %>% gather %>%
with(table(value, key))
# key
# value aa bb cc
# 7 1 1 2
# 8 2 1 2
# 9 1 2 0
Thanks to Markus for a base R version:
table(stack(df))
# ind
# values aa bb cc
# 7 1 1 2
# 8 2 1 2
# 9 1 2 0
Upvotes: 3