Thomas Rosa
Thomas Rosa

Reputation: 742

How do you make a multiple variable frequency table in R when not all values are present in all columns?

I'd like to make a frequency table like this in R:

df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,9,8,7))
apply(df, 2, table)

# outputs:  
#   aa bb cc
# 7  1  1  2
# 8  2  1  1
# 9  1  2  1

But, if one of the columns of df would have a count of 0 (e.g. if we change the above so that df$cc has no 9) we'll get a list instead of a nice dataframe.

# example that gives a list
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,8,8,7))
apply(df, 2, table)

What's a simple way do something similar that will guarantee dataframe output regardless of the counts?

I can imagine a number of solutions that seem messy or hacked, for example, this produces the desired result:

# example of a messy but correct solution
df = data.frame(aa = c(9,8,7,8), bb = c(9,7,9,8), cc = c(7,8,8,7))
apply(df, 2, function(x) summary(factor(x, levels = unique(unlist(df)))))

Is there a cleaner way to do this?

Upvotes: 1

Views: 483

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146144

I'll go ahead and answer, though I still object to the lack of criteria. If we think of "tidy" as the opposite of "messy", then we should first tidy the input data into a long format. Then we can do a two-way table:

library(tidyr)
df %>% gather %>%
  with(table(value, key))
#      key
# value aa bb cc
#     7  1  1  2
#     8  2  1  2
#     9  1  2  0

Thanks to Markus for a base R version:

table(stack(df))
#       ind
# values aa bb cc
#      7  1  1  2
#      8  2  1  2
#      9  1  2  0

Upvotes: 3

Related Questions