Nick Larsen
Nick Larsen

Reputation: 18877

Reduce columns of a matrix by a function in R

I have a matrix sort of like:

data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
#   1 2 3 2  3 2
# 1 5 4 9 6  7 8
# 2 6 9 9 1  2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8  6 4
# 5 6 4 5 9  4 4

Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:

#   1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4

The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melting the data then using group_by from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.

Upvotes: 2

Views: 463

Answers (1)

akrun
akrun

Reputation: 887223

We can use rowMins from library(matrixStats)

library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
    function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
#     1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4

row.names(res) <- row.names(values)

Upvotes: 4

Related Questions