setbackademic
setbackademic

Reputation: 143

create a summary data frame based on counts

I am trying to use a data frame to create a second data frame of summary counts. My original data has the structure of:

mydata <- read.table(header=TRUE, text="
item    type    store1  store2  store3  store4  store5
chair   timber  0   1   4   0   6
chair   metal   0   1   4   1   9
chair   upholstered 3   0   0   1   1
table   indoor  1   8   0   1   0
table   outdoor 1   12  2   1   0
bed single  0   0   2   1   0
bed double  0   1   1   1   0
bed queen   1   0   0   1   3
bed king    5   0   1   3   0")

I want my summary data frame to count every type of furniture present at each store and give me a summary of which stock is at each store (just presence/absence, not number of items). It should look like this:

summary <- read.table(header=TRUE, text="
store   chair_types table_types bed_types   total_types
store1  1   2   2   5
store2  2   2   1   5
store3  2   1   3   6
store4  2   2   4   8
store5  3   0   1   4")

This would be easy in excel but I'm trying to bite the bullet and learn to do it properly. Apologies if this is a duplicate, but I couldn't find a similar example. Thanks in advance.

Upvotes: 3

Views: 2967

Answers (2)

Barker
Barker

Reputation: 2094

The core of what you want can be done with the R function aggregate in the base R stats package

> aggregated <- aggregate(mydata[grep("store",names(mydata))], 
                          by = mydata["item"], 
                          FUN = function(x) sum(x != 0))
> aggregated
   item store1 store2 store3 store4 store5
1   bed      2      1      3      4      1
2 chair      1      2      2      2      3
3 table      2      2      1      2      0

The first parameter mydata[grep("store",names(mydata))] is selecting the "stores" from your data frame. The second parameter by = mydata["item"] is indicating that your want to use the "item" as to identify groups in your data frame. Finally, FUN = function(x) sum(x != 0) indicates that you want to count the number of non-zero elements for each item for each of your store columns.

This may be sufficient, but if you would like to reformat it more similarly to what you have above you can do:

> summary <- as.data.frame(t(aggregated[-1]))
> names(summary) <- aggregated[[1]]
> summary[["total"]] <- rowSums(summary)
> summary
       bed chair table total
store1   2     1     2     5
store2   1     2     2     5
store3   3     2     1     6
store4   4     2     2     8
store5   1     3     0     4

Upvotes: 3

akrun
akrun

Reputation: 887971

We can do this with dplyr/tidyr. After grouping by 'item', loop over the 'store' columns (summarise_each), get the number of non-zero elements in each 'store' column (sum(.!=0), convert to 'long' format (gather), paste the substriing '_types' to 'item', spread the 'long' format to 'wide', and create a 'total' column using rowSums

library(dplyr)
library(tidyr)
mydata %>% 
     group_by(item) %>%
     summarise_each(funs(sum(.!=0)), store1:store5) %>% 
     gather(store, val, store1:store5) %>% 
     mutate(item = paste0(item, "_types")) %>%
     spread(item, val) %>%
     mutate(total = rowSums(.[-1]))
#   store bed_types chair_types table_types total
#   <chr>     <int>       <int>       <int> <dbl>
#1 store1         2           1           2     5
#2 store2         1           2           2     5
#3 store3         3           2           1     6
#4 store4         4           2           2     8
#5 store5         1           3           0     4

This can also be done by first converting to 'long' format, grouped by 'item', 'store', get the number of non-zero elements (summarise), grouped by 'store', create the 'Total' column by adding up the 'val' and then spread

mydata %>% 
     gather(store, val, store1:store5) %>%
     group_by(item, store) %>% 
     summarise(val = sum(val!=0)) %>% 
     group_by(store) %>% 
     mutate(Total = sum(val)) %>% 
     spread(item, val)

We can also do this easily with base R using rowsum and addmargins

addmargins(t(rowsum(+(mydata[-(1:2)]!=0), mydata[,1])), 2)
#       bed chair table Sum
#store1   2     1     2   5
#store2   1     2     2   5
#store3   3     2     1   6
#store4   4     2     2   8
#store5   1     3     0   4

Upvotes: 3

Related Questions