Glen_b
Glen_b

Reputation: 8252

Easy way of "adding" two large tables in R?

Is there an easy way of adding the counts for each category in two large tables in R?

... where the tables don't all have exactly the same values present (though they will mostly overlap):

Small example of what I am trying to do. Set up some data:

  x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
  x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)

  table(x1)
x1
 2  3  4  5  6  8  9 11 
 2  5  2  5  1  3  1  1 

 table(x2)
x2
 5  6  7  8  9 10 11 12 14 17 
 2  2  3  1  1  1  2  1  1  1 

Now I want to combine these tables as if I had done table(c(x1,x2)), getting:

 2  3  4  5  6  7  8  9 10 11 12 14 17 
 2  5  2  7  3  3  4  2  1  3  1  1  1 

But now imagine x1 and x2 are gone (and are really large so I really don't want to recreate them from the tables and actually do table(c(x1,x2))), all I want is to take the tables t1 and t2 and add their (often very large) counts... which I can do in several really clunky ways.

However, this seems like it should be both very common and very easily-solved problem (indeed, I reckon that t1 + t2 ought to work for tables with categories of the same type) but searching for questions on every search term I could think of didn't find anything.

Have I missed a really simple and obvious way to do this?

Edit:

To clarify, something like this (which I did) is not 'simple and obvious' for what must be a very common operation with tables:

 m <- merge(t1,t2,by.x="x1",by.y="x2",all=TRUE)
 m[is.na(m)] <- 0
 oo <- order(m$x1)
 t12 <- m[oo,2]+m[oo,3]
 names(t12) <- m[oo,1]

In particular this is really no simpler nor easier to follow than the brute force approach.

Upvotes: 8

Views: 171

Answers (3)

leerssej
leerssej

Reputation: 14958

In dplyr:

library(dplyr)
x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)

# Transform "Tables" to DataFrames & standardize column names
df1 <- as.data.frame(table(x1)) %>% select(x = x1, Freq)
df2 <- as.data.frame(table(x2)) %>% select(x = x2, Freq)  

# Merge tables & aggregate results
Ttldf <- bind_rows(df1, df2) %>% group_by(x) %>% summarise(TtlFreq = sum(Freq))

For a good, concise introduction to summarising and piping the Vignette is a great resource: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

For further information about ways to rapidly use summarization to further best effect, Markham's tutorial is very helpful: https://rpubs.com/justmarkham/dplyr-tutorial

Upvotes: 0

Arun
Arun

Reputation: 118779

Another way using tapply:

tapply(c(t1,t2), names(c(t1,t2)), sum)
# 10 11 12 14 17  2  3  4  5  6  7  8  9 
#  1  3  1  1  1  2  5  2  7  3  3  4  2 

Here's if you want a sorted output:

w <- c(t1,t2)
# edit: Following G.Grothendieck's suggestion to simplify it further
tapply(w, as.numeric(names(w)), sum)
#  2  3  4  5  6  7  8  9 10 11 12 14 17 
#  2  5  2  7  3  3  4  2  1  3  1  1  1 

Upvotes: 8

Daniel Fischer
Daniel Fischer

Reputation: 3380

As @PaulHiemstra said, mergeshould do the job. I am not too familiar with it, but this code should work (though there might be more efficient ways to do it...)

x1 <- c(5, 3, 3, 6, 3, 3, 5, 5, 11, 2, 4, 9, 3, 5, 8, 2, 8, 5, 4, 8)
x2 <- c(6, 10, 9, 17, 6, 7, 8, 11, 5, 12, 14, 5, 11, 7, 7)

tx1 <- table(x1)
tx2 <- table(x2)

df1 <- data.frame(names=names(tx1),values=as.vector(tx1))
df2 <- data.frame(names=names(tx2),values=as.vector(tx2))

mdf12 <- merge(df1,df2,by="names",all=TRUE)
mdf12[is.na(mdf12)] <- 0

counts <- mdf12[,2] + mdf12[,3]
names(counts) <- mdf12[,1]

counts[order(as.numeric(names(counts)))]
table(c(x1,x2))

I don't like the is.na step, but I do not know how to make it, that there are 0in the first place instead of NA.

Upvotes: 1

Related Questions