Reputation: 5719
I have a matrix called mymat
. I want to create another matrix with the pairwise combination of all the items in mymat and their values
added together and get something like the result
.
mymat<- structure(c("AOGC-03-0122", "AOGC-05-0009", "AOGC-08-0006", "AOGC-08-0032",
"AOGC-08-0054", "0.000971685122254438", "0.00114138129544444",
"0.000779586347096811", "0.00132807674454652", "0.000867219894408284"
), .Dim = c(5L, 2L), .Dimnames = list(NULL, c("samples", "value"
)))
result
combination total.value
AOGC-03-0122+AOGC-03-0122 0.00194337
AOGC-03-0122+AOGC-05-0009 0.002113066
.
.
.
AOGC-08-0054+AOGC-08-0054 0.00173444
Upvotes: 1
Views: 49
Reputation: 35314
A matrix is a homogeneous data object. It is basically a matrix
-classed atomic vector with a dimension attribute (ignoring the case of a matrix of lists). You cannot have a combination of strings and numbers in a single matrix. When you want to store a table of data with heterogeneous column types you should be using a data.frame. It definitely appears that the appropriate types of the samples
and value
columns are string and number, respectively. Hence, your input matrix should really be a data.frame, and your output should be a data.frame as well, since it merely permutes the input records.
You shouldn't need to call merge()
here, and certainly not twice; vectorized indexing can do the job. And using merge()
will cause the permutation order to depend on the lexicographic order of the samples
values, rather than the order in which they occur in the input, which is probably undesirable.
values <- as.double(mymat[,'value']);
with(expand.grid(rep(list(seq_len(nrow(mymat))),2L)),
data.frame(
combination=paste(mymat[Var2,'samples'],mymat[Var1,'samples'],sep='+'),
total.value=values[Var2]+values[Var1]
)
);
## combination total.value
## 1 AOGC-03-0122+AOGC-03-0122 0.001943370
## 2 AOGC-03-0122+AOGC-05-0009 0.002113066
## 3 AOGC-03-0122+AOGC-08-0006 0.001751271
## 4 AOGC-03-0122+AOGC-08-0032 0.002299762
## 5 AOGC-03-0122+AOGC-08-0054 0.001838905
## 6 AOGC-05-0009+AOGC-03-0122 0.002113066
## 7 AOGC-05-0009+AOGC-05-0009 0.002282763
## 8 AOGC-05-0009+AOGC-08-0006 0.001920968
## 9 AOGC-05-0009+AOGC-08-0032 0.002469458
## 10 AOGC-05-0009+AOGC-08-0054 0.002008601
## 11 AOGC-08-0006+AOGC-03-0122 0.001751271
## 12 AOGC-08-0006+AOGC-05-0009 0.001920968
## 13 AOGC-08-0006+AOGC-08-0006 0.001559173
## 14 AOGC-08-0006+AOGC-08-0032 0.002107663
## 15 AOGC-08-0006+AOGC-08-0054 0.001646806
## 16 AOGC-08-0032+AOGC-03-0122 0.002299762
## 17 AOGC-08-0032+AOGC-05-0009 0.002469458
## 18 AOGC-08-0032+AOGC-08-0006 0.002107663
## 19 AOGC-08-0032+AOGC-08-0032 0.002656153
## 20 AOGC-08-0032+AOGC-08-0054 0.002195297
## 21 AOGC-08-0054+AOGC-03-0122 0.001838905
## 22 AOGC-08-0054+AOGC-05-0009 0.002008601
## 23 AOGC-08-0054+AOGC-08-0006 0.001646806
## 24 AOGC-08-0054+AOGC-08-0032 0.002195297
## 25 AOGC-08-0054+AOGC-08-0054 0.001734440
bgoldst <- function(mymat) { values <- as.double(mymat[,'value']); with(expand.grid(rep(list(seq_len(nrow(mymat))),2L)),data.frame(combination=paste(mymat[Var2,'samples'],mymat[Var1,'samples'],sep='+'),total.value=values[Var2]+values[Var1])); };
akrun <- function(mymat) { d1 <- expand.grid(rep(list(mymat[, "samples"]),2)); d2 <- data.frame(samples=mymat[,1], value = as.numeric(mymat[,2]), stringsAsFactors=FALSE); d3 <- merge(merge(d1, d2, by.x="Var1", by.y="samples", all.x=TRUE), d2, by.x="Var2", by.y= "samples"); res <- data.frame(combination = do.call(paste, c(d3[1:2], sep="+")), total.value = d3[,3]+d3[,4]); };
identical(bgoldst(mymat),akrun(mymat));
## [1] TRUE
library(microbenchmark);
microbenchmark(bgoldst(mymat),akrun(mymat));
## Unit: microseconds
## expr min lq mean median uq max neval
## bgoldst(mymat) 390.875 412.685 444.4554 433.8535 457.589 662.434 100
## akrun(mymat) 1603.697 1658.009 1789.0585 1692.0075 1824.793 3227.921 100
N <- 1e3; mymat <- matrix(c(sprintf('sample_%d',seq_len(N)),runif(N)),ncol=2L,dimnames=list(NULL,c('samples','value')));
x <- bgoldst(mymat); y <- akrun(mymat); identical(structure(transform(x[order(x$combination),],combination=as.character(combination)),row.names=seq_len(nrow(x))),structure(transform(y[order(y$combination),],combination=as.character(combination)),row.names=seq_len(nrow(y)))); ## annoyingly involved line of code to obviate row order, factor levels order, and row names differences
## [1] TRUE
microbenchmark(bgoldst(mymat),akrun(mymat),times=3L);
## Unit: seconds
## expr min lq mean median uq max neval
## bgoldst(mymat) 8.103589 8.328722 8.418285 8.553854 8.575633 8.597411 3
## akrun(mymat) 30.777301 31.152458 31.348615 31.527615 31.634272 31.740929 3
Upvotes: 2
Reputation: 887213
We can use expand.grid
with merge
d1 <- expand.grid(rep(list(mymat[, "samples"]),2))
d2 <- data.frame(samples=mymat[,1], value = as.numeric(mymat[,2]),
stringsAsFactors=FALSE)
d3 <- merge(merge(d1, d2, by.x="Var1", by.y="samples", all.x=TRUE),
d2, by.x="Var2", by.y= "samples")
res <- data.frame(combination = do.call(paste, c(d3[1:2], sep="+")),
total.value = d3[,3]+d3[,4])
head(res,3)
# combination total.value
#1 AOGC-03-0122+AOGC-03-0122 0.001943370
#2 AOGC-03-0122+AOGC-05-0009 0.002113066
#3 AOGC-03-0122+AOGC-08-0006 0.001751271
Upvotes: 1