user3745220
user3745220

Reputation: 41

Matching matrix elements

I have the following to deal with. I have two matrices

a b c
b d d
a d b

and

1 2 3
4 5 6
7 8 9

I need to be able to determine specified means from the second matrix as follows:

Now these two matrices are fairly trivial, the matrices that I am having to use are in the region of 100 x 100.

Any ideas would be welcomed

Thanks

Upvotes: 2

Views: 654

Answers (4)

zx8754
zx8754

Reputation: 56054

Using aggregate, might be the slowest...

aggregate(m2~m1,mean,data=data.frame(m1=c(m1),m2=c(m2)))
# m1       m2
# 1  a 4.000000
# 2  b 5.000000
# 3  c 3.000000
# 4  d 6.333333

Upvotes: 0

akrun
akrun

Reputation: 887028

You could use data.table for bigger datasets

 library(data.table)
 dt1 <- data.table(c(m1),c(m2)) #@MrFlick's datasets 
 dt1[,mean(V2), by=V1]
 dt1[,list(V2=mean(V2)), by=V1]
 #   V1       V2
 #1:  a 4.000000
 #2:  b 5.000000
 #3:  d 6.333333
 #4:  c 3.000000

Speed

set.seed(45)
m1N <- matrix(sample(letters[1:20], 1e3*1e3, replace=TRUE), ncol=1e3)
m2N <- matrix(sample(0:40, 1e3*1e3, replace=TRUE), ncol=1e3)

system.time(res1 <- tapply(m2N, m1N, mean))
#user  system elapsed 
# 7.605   0.004   7.618 

system.time({dt <- data.table(c(m1N), c(m2N))
        setkey(dt, V1)
            res2 <- dt[,mean(V2), by=V1]})
 #user  system elapsed 
 #  0.043   0.000   0.043 

system.time(res3 <- unlist(lapply(split(m2N, m1N),mean)))
#  user  system elapsed 
# 7.864   0.016   7.891 

system.time(res4 <- sapply(sort(unique.default(m1N)), function(x) mean(m2N[m1N == x])))
# user  system elapsed 
# 1.007   0.012   1.021 

Upvotes: 3

Rich Scriven
Rich Scriven

Reputation: 99331

Since tapply calls split, on large matrices you may find it more efficient to use split directly with

> unlist(lapply(split(m, m2), mean)) 
### or slightly slower: sapply(split(m, m2), mean)
#        a        b        c        d 
# 4.000000 5.000000 3.000000 6.333333 

where

> m <- structure(c(1L, 4L, 7L, 2L, 5L, 8L, 3L, 6L, 9L), .Dim = c(3L,3L))
> m2 <- structure(c("a","b","a","b","d","d","c","d","b"), .Dim = c(3L, 3L))

Quick check:

> f <- function() tapply(m, m2, mean)
> g <- function() unlist(lapply(split(m, m2), mean))
> library(microbenchmark)
> microbenchmark(f(), g(), times = 1e4L)
# Unit: microseconds
#  expr     min       lq   median      uq       max neval
#   f() 421.083 432.2575 436.3975 440.503  3633.401 10000
#   g() 267.119 277.1495 280.2180 283.982 69714.687 10000

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

You can use a simple tapply here. For example

#sample input
m1<-matrix(letters[c(1,2,1,2,4,4,3,4,2)], ncol=3)
m2<-matrix(1:9, byrow=T, ncol=3)

tapply(m2, m1, mean)
#        a        b        c        d 
# 4.000000 5.000000 3.000000 6.333333 

the fact that they are in a matrix doesn't really matter as long as the dimensions match up exactly.

Upvotes: 3

Related Questions