dvmlls
dvmlls

Reputation: 2246

Aggregating one matrix by values in another matrix

As I am wont to do, I'm keeping tabs on my cats using matrices.

catWeights <- cbind(fluffy=c(5.0,5.1,5.2,5.3),misterCuddles=c(1.2,1.3,1.4,1.5),captainMew=c(4.3,4.2,4.1,4.0))
catTypes <- cbind(fluffy=c('cat','cat','cat','cat'),misterCuddles=c('kitten','kitten','kitten','cat'),captainMew=c('cat','cat','cat','cat'))
dates <- c("2013-01-01", "2013-01-02", "2013-01-03","2013-01-04")
row.names(catWeights) <- dates
row.names(catTypes) <- dates

On any date, I know how much each of them weigh:

> catWeights
           fluffy misterCuddles captainMew
2013-01-01    5.0           1.2        4.3
2013-01-02    5.1           1.3        4.2
2013-01-03    5.2           1.4        4.1
2013-01-04    5.3           1.5        4.0

And I know whether they're cats or kittens:

> catTypes
           fluffy misterCuddles captainMew
2013-01-01 "cat"  "kitten"      "cat"     
2013-01-02 "cat"  "kitten"      "cat"     
2013-01-03 "cat"  "kitten"      "cat"     
2013-01-04 "cat"  "cat"         "cat"  

How can I tell how much all my cats and all my kittens weigh through time?

I want this:

> totalWeights

             cat    kitten
2013-01-01   9.3       1.2
2013-01-02   9.3       1.3
2013-01-03   9.3       1.4
2013-01-04  10.8       0.0

On the fourth of January, Mister Cuddles turned 1, so he was no longer a kitten. His weight moved from the kitten bucket to the cat bucket.

Upvotes: 1

Views: 70

Answers (3)

dvmlls
dvmlls

Reputation: 2246

Microbenchmarking alexis_laz's two solutions on a 2500x2500 matrix with 10 groups:

> microbenchmark(cbindLapply(), sapplyOnly(), times=100)
Unit: milliseconds
          expr      min       lq   median       uq      max neval
 cbindLapply() 841.4796 865.2220 879.9099 892.6265 990.5915   100
  sapplyOnly() 846.3675 869.7372 879.0286 901.3314 979.6136   100

Upvotes: 0

alexis_laz
alexis_laz

Reputation: 13122

This seems valid using the sample data:

do.call(cbind, 
      lapply(c("cat", "kitten"), 
            function(x) rowSums(catWeights * (catTypes == x))))
#           [,1] [,2]
#2013-01-01  9.3  1.2
#2013-01-02  9.3  1.3
#2013-01-03  9.3  1.4
#2013-01-04 10.8  0.0

EDIT:

As @BlueMagister commented ... lapply(unique(as.vector(catTypes)), ... is the more general form of the answer. I guess, though, you've already found a way to overcome this, since you accepted the answer. The as.vector is because unique has a matrix method that is not convenient in this specific case.

Also, since I 'm in editing mode, I'll note that sapply could 've been used, but based on some rough benchmarks I 've made from time to time, I 've found lapply to be faster even if it is accompanied by a do.call(r/cbind, ..) or a unlist. I did not test it for a larger dataset in this specific case, though.

So, another format of the answer could've been:

sapply(unique(as.vector(catTypes)), 
             function(x) rowSums(catWeights * (catTypes == x)))

Upvotes: 4

ialm
ialm

Reputation: 8717

Here is a not very general answer that applies only to the example data set.

# Construct matrices for the cat weights and kitten weights
catWts <- ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)
kittenWts <- ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)

# Well, then just take the row sums for the two matrices
catSums <- rowSums(catWts)
kittenSums <- rowSums(kittenWts)

# Then combine it to a data frame
totalWeights <- data.frame(cat=catSums, kitten=kittenSums)

# In one line
data.frame(cat=rowSums(ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)),
           kitten=rowSums(ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)))

#            cat kitten
#2013-01-01  9.0    1.3
#2013-01-02 10.1    1.4
#2013-01-03 10.3    1.2
#2013-01-04 14.6    0.0

I would imagine that there is a more general approach to solving this problem.

Upvotes: 0

Related Questions