swolf
swolf

Reputation: 1143

How do I sum the values of columns in several tables if tables have different lengths?

Alright, this should be an easy one but I'm looking for a solution that's as fast as possible.

Let's say I have 3 tables (the number of tables will be much larger):

tab1 <- table(c(1, 1, 1, 2, 2, 3, 3, 3))
tab2 <- table(c(1, 1, 4, 4, 4))
tab3 <- table(c(1, 1, 2, 3, 5))

This is what we get:

> tab1
1 2 3 
3 2 3 
> tab2
1 4 
2 3 
> tab3
1 2 3 5 
2 1 1 1 

What I want to have in a fast way so that it works with many big tables is this:

1 2 3 4 5
7 3 4 3 1

So, basically the tables get aggregated over all names. Is there an elementary function that does this which I am missing? Thanks for your help!

Upvotes: 8

Views: 173

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99361

You could use rowsum(). The output will be slightly different than what you show, but you can always restructure it after the calculations. rowsum() is known to be very efficient.

x <- c(tab1, tab2, tab3)
rowsum(x, names(x))
#   [,1]
# 1    7
# 2    3
# 3    4
# 4    3
# 5    1

Here's a benchmark with akrun's data.table suggestion added in as well.

library(microbenchmark)
library(data.table)

xx <- rep(x, 1e5)

microbenchmark(
    tapply = tapply(xx, names(xx), FUN=sum),
    rowsum = rowsum(xx, names(xx)),
    data.table = data.table(xx, names(xx))[, sum(xx), by = V2]
)
# Unit: milliseconds
#        expr       min        lq      mean    median        uq       max neval
#      tapply 150.47532 154.80200 176.22410 159.02577 204.22043 233.34346   100
#      rowsum  41.28635  41.65162  51.85777  43.33885  45.43370 109.91777   100
#  data.table  21.39438  24.73580  35.53500  27.56778  31.93182  92.74386   100

Upvotes: 5

akrun
akrun

Reputation: 887691

We concatenate (c) the tab output to create 'v1', use tapply to get the sum of the elements grouped by the names of that object.

v1 <- c(tab1, tab2, tab3)
tapply(v1, names(v1), FUN=sum)
#1 2 3 4 5 
#7 3 4 3 1 

Upvotes: 12

Mamoun Benghezal
Mamoun Benghezal

Reputation: 5314

you can try this

df <- rbind(as.matrix(tab1), as.matrix(tab2), as.matrix(tab3))
aggregate(df, by=list(row.names(df)), FUN=sum)
  Group.1 V1
1       1  7
2       2  3
3       3  4
4       4  3
5       5  1

Upvotes: 1

Related Questions