create a summary data frame based on counts

Question

I am trying to use a data frame to create a second data frame of summary counts. My original data has the structure of:

mydata <- read.table(header=TRUE, text="
item    type    store1  store2  store3  store4  store5
chair   timber  0   1   4   0   6
chair   metal   0   1   4   1   9
chair   upholstered 3   0   0   1   1
table   indoor  1   8   0   1   0
table   outdoor 1   12  2   1   0
bed single  0   0   2   1   0
bed double  0   1   1   1   0
bed queen   1   0   0   1   3
bed king    5   0   1   3   0")

I want my summary data frame to count every type of furniture present at each store and give me a summary of which stock is at each store (just presence/absence, not number of items). It should look like this:

summary <- read.table(header=TRUE, text="
store   chair_types table_types bed_types   total_types
store1  1   2   2   5
store2  2   2   1   5
store3  2   1   3   6
store4  2   2   4   8
store5  3   0   1   4")

This would be easy in excel but I'm trying to bite the bullet and learn to do it properly. Apologies if this is a duplicate, but I couldn't find a similar example. Thanks in advance.

akrun · Accepted Answer

We can do this with dplyr/tidyr. After grouping by 'item', loop over the 'store' columns (summarise_each), get the number of non-zero elements in each 'store' column (sum(.!=0), convert to 'long' format (gather), paste the substriing '_types' to 'item', spread the 'long' format to 'wide', and create a 'total' column using rowSums

library(dplyr)
library(tidyr)
mydata %>% 
     group_by(item) %>%
     summarise_each(funs(sum(.!=0)), store1:store5) %>% 
     gather(store, val, store1:store5) %>% 
     mutate(item = paste0(item, "_types")) %>%
     spread(item, val) %>%
     mutate(total = rowSums(.[-1]))
#   store bed_types chair_types table_types total
#                       
#1 store1         2           1           2     5
#2 store2         1           2           2     5
#3 store3         3           2           1     6
#4 store4         4           2           2     8
#5 store5         1           3           0     4

This can also be done by first converting to 'long' format, grouped by 'item', 'store', get the number of non-zero elements (summarise), grouped by 'store', create the 'Total' column by adding up the 'val' and then spread

mydata %>% 
     gather(store, val, store1:store5) %>%
     group_by(item, store) %>% 
     summarise(val = sum(val!=0)) %>% 
     group_by(store) %>% 
     mutate(Total = sum(val)) %>% 
     spread(item, val)

We can also do this easily with base R using rowsum and addmargins

addmargins(t(rowsum(+(mydata[-(1:2)]!=0), mydata[,1])), 2)
#       bed chair table Sum
#store1   2     1     2   5
#store2   1     2     2   5
#store3   3     2     1   6
#store4   4     2     2   8
#store5   1     3     0   4

create a summary data frame based on counts

Answers (2)

Related Questions