Reputation: 18877
I have a matrix sort of like:
data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
# 1 2 3 2 3 2
# 1 5 4 9 6 7 8
# 2 6 9 9 1 2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8 6 4
# 5 6 4 5 9 4 4
Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:
# 1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4
The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melt
ing the data then using group_by
from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.
Upvotes: 2
Views: 463
Reputation: 887223
We can use rowMins
from library(matrixStats)
library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
# 1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4
row.names(res) <- row.names(values)
Upvotes: 4