pomegranate
pomegranate

Reputation: 765

Sum columns by group (row names) in a matrix

Let's say I have a matrix called x.

x <- structure(c(1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), 
.Dim = c(5L, 4L), .Dimnames = list(c("Cake", "Pie", "Cake", "Pie", "Pie"),
c("Mon", "Tue", "Wed", "Thurs"))) 

x
     Mon   Tue   Wed   Thurs
Cake   1     0     1      1
Pie    0     0     1      1
Cake   1     1     0      1
Pie    0     0     1      1
Pie    0     0     1      1

I want to sum each column grouped by row names:

     Mon   Tue   Wed   Thurs
Cake   2     1     1      2
Pie    0     0     3      3

I've tried using addmargins(x), but that just gives me the sum of each column and row. Any suggestions? I searched other questions, but couldn't figure this out.

Upvotes: 9

Views: 6575

Answers (3)

David Arenburg
David Arenburg

Reputation: 92300

Here's a vectorized base solution

rowsum(df, row.names(x))
#      Mon Tue Wed Thurs
# Cake   2   1   1     2
# Pie    0   0   3     3

Or data.table version using keep.rownames = TRUE in order to convert your row names to a column

library(data.table)
as.data.table(x, keep.rownames = TRUE)[, lapply(.SD, sum), by = rn]
#      rn Mon Tue Wed Thurs
# 1: Cake   2   1   1     2
# 2:  Pie   0   0   3     3

Upvotes: 11

Mamoun Benghezal
Mamoun Benghezal

Reputation: 5314

You can try this

df <- read.table(head=TRUE, text="
Name       Mon   Tue   Wed   Thurs
Cake   1     0     1      1
Pie    0     0     1      1
Cake   1     1     0      1
Pie    0     0     1      1
Pie    0     0     1      1")

aggregate(. ~ Name, data=df, FUN=sum)
##   Name Mon Tue Wed Thurs
## 1 Cake   2   1   1     2
## 2  Pie   0   0   3     3

also with dplyr

library(dplyr)
group_by(df, Name) %>%
    summarise(Mon = sum(Mon), Tue = sum(Tue), Wed = sum(Wed), Thurs = sum(Thurs))

or better

 group_by(df, Name) %>%
    summarise_each(funs(sum))

Upvotes: 7

Colonel Beauvel
Colonel Beauvel

Reputation: 31181

An approach using plyr:

ldply(split(df, df$Name), function(u) colSums(u[-1]))
#   .id Mon Tue Wed Thurs
#1 Cake   2   1   1     2
#2  Pie   0   0   3     3

Data:

df = structure(list(Name = structure(c(1L, 2L, 1L, 2L, 2L), .Label = c("Cake", 
"Pie"), class = "factor"), Mon = c(1L, 0L, 1L, 0L, 0L), Tue = c(0L, 
0L, 1L, 0L, 0L), Wed = c(1L, 1L, 0L, 1L, 1L), Thurs = c(1L, 1L, 
1L, 1L, 1L)), .Names = c("Name", "Mon", "Tue", "Wed", "Thurs"
), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 2

Related Questions