Nathalie
Nathalie

Reputation: 1238

How to produce a frequency dataframe using 3 columns?

From a dataframe of 3 columns

df <- data.frame(month = c(1,1,1,2,2,2,3,3,4,4), country = c("US", "US", "US", "Brazil", "US", "Brazil", "UK", "UK", "Brazil", "US"), id = c(1,2,3,4,5,6,7,8,9,10))

I would like to count the id for every country for every month. How is it possible to take the result of table() with more than one columns?

Here an example output:

df2 <- data.frame(month=c(1,1,1,2,2,2,3,3,3,4,4,4), country = c("US", "Brazil", "UK","US", "Brazil", "UK","US", "Brazil", "UK","US", "Brazil", "UK"), frequency = c(3,0,0,1,2,0,0,0,2,1,1,0))
> df2
   month country frequency
1      1      US         3
2      1  Brazil         0
3      1      UK         0
4      2      US         1
5      2  Brazil         2
6      2      UK         0
7      3      US         0
8      3  Brazil         0
9      3      UK         2
10     4      US         1
11     4  Brazil         1
12     4      UK         0

Upvotes: 0

Views: 114

Answers (2)

r2evans
r2evans

Reputation: 160407

In general, whenever I think about frequency tables with more than one column (table(...)), I think of xtabs(~each + and + every + column, data=myframe). That's not to say that table doesn't support multiple columns ... it does, but I find the formula interface a little more intuitive to me.

xtabs returns 3D tables as well(see xtabs(~cyl+vs+am,data=mtcars)), and as you can tell, it can easily be converted into a "long"-style frame (which it has in common with table). (In fact, the output from xtabs includes the class "table", so anything that works on the latter should work on xtabs. *shrug*)

as.data.frame(xtabs(~ month + country, data = df))
#    month country Freq
# 1      1  Brazil    0
# 2      2  Brazil    2
# 3      3  Brazil    0
# 4      4  Brazil    1
# 5      1      UK    0
# 6      2      UK    0
# 7      3      UK    2
# 8      4      UK    0
# 9      1      US    3
# 10     2      US    1
# 11     3      US    0
# 12     4      US    1

Upvotes: 3

Shree
Shree

Reputation: 11140

Here's a way with dplyr and tidyr::complete() -

df %>% 
  count(month, country) %>% 
  complete(month, country, fill = list(n = 0))

# A tibble: 12 x 3
   month country     n
   <dbl> <fct>   <dbl>
 1     1 Brazil      0
 2     1 UK          0
 3     1 US          3
 4     2 Brazil      2
 5     2 UK          0
 6     2 US          1
 7     3 Brazil      0
 8     3 UK          2
 9     3 US          0
10     4 Brazil      1
11     4 UK          0
12     4 US          1

Upvotes: 3

Related Questions