alevax
alevax

Reputation: 80

Applying element wise operation on a 2D matrix using summarized column information in R

I have a 2D matrix and I want to access every element in a vectorized operation. The operation I'd like to perform it is easy: take the value in the cell and add the min between all the values in the same column (or row). In other words, I want to apply a offset on every cell of the value that depend on all the values of the columns (or row) where the cell belongs to.

How can I do it? There is a easy way? I don't want to use loops or auxiliaries vectors.

Example

my.data <- matrix( seq(1:30) , nrow = 3 )
my.data
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    4    7   10   13   16   19   22   25    28
[2,]    2    5    8   11   14   17   20   23   26    29
[3,]    3    6    9   12   15   18   21   24   27    30

e.g. I want the column 2 to be c(8,9,10) and so on.

Thank you all.

Upvotes: 1

Views: 81

Answers (2)

bgoldst
bgoldst

Reputation: 35324

Here's an approach built around pmin().

By row:

my.data+do.call(pmin,as.data.frame(my.data));
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    2    5    8   11   14   17   20   23   26    29
## [2,]    4    7   10   13   16   19   22   25   28    31
## [3,]    6    9   12   15   18   21   24   27   30    33

By column:

t(t(my.data)+do.call(pmin,as.data.frame(t(my.data))));
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    2    8   14   20   26   32   38   44   50    56
## [2,]    3    9   15   21   27   33   39   45   51    57
## [3,]    4   10   16   22   28   34   40   46   52    58

Benchmarking

library(microbenchmark);

thepule.col <- function(m) apply(m,2L,function(x) x+min(x));
thepule.row <- function(m) t(apply(m,1L,function(x) x+min(x)));
bgoldst.col <- function(m) t(t(m)+do.call(pmin,as.data.frame(t(m))));
bgoldst.row <- function(m) m+do.call(pmin,as.data.frame(m));

## small scale (OP's example)
NR <- 3L; NC <- 10L; m <- matrix(seq_len(NR*NC),NR);

ex.col <- thepule.col(m);
identical(ex.col,bgoldst.col(m));
## [1] TRUE
ex.row <- thepule.row(m);
identical(ex.row,bgoldst.row(m));
## [1] TRUE

microbenchmark(thepule.col(m),bgoldst.col(m));
## Unit: microseconds
##            expr    min     lq     mean  median     uq     max neval
##  thepule.col(m) 36.351 40.200 46.30256 42.5520 50.891 107.769   100
##  bgoldst.col(m) 38.489 41.055 47.26482 42.9795 52.602 105.631   100
microbenchmark(thepule.row(m),bgoldst.row(m));
## Unit: microseconds
##            expr    min     lq     mean  median      uq     max neval
##  thepule.row(m) 31.647 33.358 36.83026 35.0685 36.5655 103.920   100
##  bgoldst.row(m) 44.477 47.043 51.16077 48.5395 50.6775 133.856   100

## medium scale
NR <- 1e2L; NC <- 1e3L; m <- matrix(seq_len(NR*NC),NR);

ex.col <- thepule.col(m);
identical(ex.col,bgoldst.col(m));
## [1] TRUE
ex.row <- thepule.row(m);
identical(ex.row,bgoldst.row(m));
## [1] TRUE

microbenchmark(thepule.col(m),bgoldst.col(m));
## Unit: milliseconds
##            expr      min       lq     mean   median       uq      max neval
##  thepule.col(m) 3.088080 3.193068 5.032765 4.344523 4.655640 38.55630   100
##  bgoldst.col(m) 1.692221 1.829712 3.128527 2.095925 3.117374 36.80976   100
microbenchmark(thepule.row(m),bgoldst.row(m));
## Unit: milliseconds
##            expr      min       lq     mean   median       uq      max neval
##  thepule.row(m) 1.779035 2.108755 4.472138 3.378242 3.708603 39.84353   100
##  bgoldst.row(m) 3.674818 3.942316 4.862983 4.173248 5.219714 40.63212   100

## large scale
NR <- 1e3L; NC <- 1e4L; m <- matrix(seq_len(NR*NC),NR);

ex.col <- thepule.col(m);
identical(ex.col,bgoldst.col(m));
## [1] TRUE
ex.row <- thepule.row(m);
identical(ex.row,bgoldst.row(m));
## [1] TRUE

microbenchmark(thepule.col(m),bgoldst.col(m));
## Unit: milliseconds
##            expr      min       lq     mean   median       uq      max neval
##  thepule.col(m) 194.4044 238.1129 244.0346 243.1782 248.4054 306.6919   100
##  bgoldst.col(m) 340.0291 382.0265 388.2586 387.3593 395.8638 463.1242   100
microbenchmark(thepule.row(m),bgoldst.row(m));
## Unit: milliseconds
##            expr      min       lq     mean   median       uq      max neval
##  thepule.row(m) 259.3259 282.6383 304.7577 294.8812 329.2672 376.1987   100
##  bgoldst.row(m) 142.1778 153.2129 167.5441 157.6520 169.3705 217.0170   100

Upvotes: 1

thepule
thepule

Reputation: 1751

How about:

my.data.mod <- apply(my.data, 2, function(x) x+min(x))
my.data.mod

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    8   14   20   26   32   38   44   50    56
[2,]    3    9   15   21   27   33   39   45   51    57
[3,]    4   10   16   22   28   34   40   46   52    58

If you want to do it by row:

my.data.mod <- t(apply(my.data, 1, function(x) x+min(x)))
my.data.mod

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    5    8   11   14   17   20   23   26    29
[2,]    4    7   10   13   16   19   22   25   28    31
[3,]    6    9   12   15   18   21   24   27   30    33

Upvotes: 4

Related Questions