Reputation: 143
I am trying to use a data frame to create a second data frame of summary counts. My original data has the structure of:
mydata <- read.table(header=TRUE, text="
item type store1 store2 store3 store4 store5
chair timber 0 1 4 0 6
chair metal 0 1 4 1 9
chair upholstered 3 0 0 1 1
table indoor 1 8 0 1 0
table outdoor 1 12 2 1 0
bed single 0 0 2 1 0
bed double 0 1 1 1 0
bed queen 1 0 0 1 3
bed king 5 0 1 3 0")
I want my summary data frame to count every type of furniture present at each store and give me a summary of which stock is at each store (just presence/absence, not number of items). It should look like this:
summary <- read.table(header=TRUE, text="
store chair_types table_types bed_types total_types
store1 1 2 2 5
store2 2 2 1 5
store3 2 1 3 6
store4 2 2 4 8
store5 3 0 1 4")
This would be easy in excel but I'm trying to bite the bullet and learn to do it properly. Apologies if this is a duplicate, but I couldn't find a similar example. Thanks in advance.
Upvotes: 3
Views: 2967
Reputation: 2094
The core of what you want can be done with the R function aggregate
in the base R stats
package
> aggregated <- aggregate(mydata[grep("store",names(mydata))],
by = mydata["item"],
FUN = function(x) sum(x != 0))
> aggregated
item store1 store2 store3 store4 store5
1 bed 2 1 3 4 1
2 chair 1 2 2 2 3
3 table 2 2 1 2 0
The first parameter mydata[grep("store",names(mydata))]
is selecting the "stores" from your data frame. The second parameter by = mydata["item"]
is indicating that your want to use the "item" as to identify groups in your data frame. Finally, FUN = function(x) sum(x != 0)
indicates that you want to count the number of non-zero elements for each item for each of your store columns.
This may be sufficient, but if you would like to reformat it more similarly to what you have above you can do:
> summary <- as.data.frame(t(aggregated[-1]))
> names(summary) <- aggregated[[1]]
> summary[["total"]] <- rowSums(summary)
> summary
bed chair table total
store1 2 1 2 5
store2 1 2 2 5
store3 3 2 1 6
store4 4 2 2 8
store5 1 3 0 4
Upvotes: 3
Reputation: 887971
We can do this with dplyr/tidyr
. After grouping by 'item', loop over the 'store' columns (summarise_each
), get the number of non-zero elements in each 'store' column (sum(.!=0
), convert to 'long' format (gather
), paste
the substriing '_types' to 'item', spread
the 'long' format to 'wide', and create a 'total' column using rowSums
library(dplyr)
library(tidyr)
mydata %>%
group_by(item) %>%
summarise_each(funs(sum(.!=0)), store1:store5) %>%
gather(store, val, store1:store5) %>%
mutate(item = paste0(item, "_types")) %>%
spread(item, val) %>%
mutate(total = rowSums(.[-1]))
# store bed_types chair_types table_types total
# <chr> <int> <int> <int> <dbl>
#1 store1 2 1 2 5
#2 store2 1 2 2 5
#3 store3 3 2 1 6
#4 store4 4 2 2 8
#5 store5 1 3 0 4
This can also be done by first converting to 'long' format, grouped by 'item', 'store', get the number of non-zero elements (summarise
), grouped by 'store', create the 'Total' column by adding up the 'val' and then spread
mydata %>%
gather(store, val, store1:store5) %>%
group_by(item, store) %>%
summarise(val = sum(val!=0)) %>%
group_by(store) %>%
mutate(Total = sum(val)) %>%
spread(item, val)
We can also do this easily with base R
using rowsum
and addmargins
addmargins(t(rowsum(+(mydata[-(1:2)]!=0), mydata[,1])), 2)
# bed chair table Sum
#store1 2 1 2 5
#store2 1 2 2 5
#store3 3 2 1 6
#store4 4 2 2 8
#store5 1 3 0 4
Upvotes: 3