Reputation: 14394
I have a list of sparse matrices with the same number of rows but different columns.
Here is a toy dataset:
library(dplyr)
library(Matrix)
ms <- list(
m1 = data.frame(a = c(1, 10, 100), d = c(2, 20, 200), e = c(3, 30, 300)) %>% as.matrix %>% as("sparseMatrix"),
m2 = data.frame(a = c(4, 40, 400), e = c(5, 50, 500), f = c(6, 60, 600), g = c(7, 70, 700)) %>% as.matrix%>% as("sparseMatrix"),
m3 = data.frame(c = c(8, 80, 800), d = c(9, 90, 900)) %>% as.matrix%>% as("sparseMatrix")
)
I want to add every matrix in ms
by column. This is how I'm currently doing it:
# get a list of unique columns
final_names <- sapply(ms, colnames) %>% unlist %>% unique
# create an empty sparseMatrix of those dimensions
final_matrix <- matrix(0, nrow = nrow(ms$m1), ncol = length(final_names)) %>%
set_colnames(final_names) %>% as("sparseMatrix")
# add the matrices by column
for(mat in ms) {
current_colnames <- colnames(mat)
final_matrix[, current_colnames] <- mat + final_matrix[, current_colnames]
}
This is my output:
final_matrix
3 x 6 sparse Matrix of class "dgCMatrix"
a d e f g c
[1,] 5 11 8 6 7 8
[2,] 50 110 80 60 70 80
[3,] 500 1100 800 600 700 800
This works, but when I try it on the real dataset, I get a segmentation fault, so there must be a better way to create an empty sparse matrix or some other approach. Any ideas?
Upvotes: 0
Views: 824
Reputation: 32548
NM = unique(unlist(lapply(ms, colnames)))
temp = do.call(cbind, ms)
sapply(NM, function(nm) rowSums(as.matrix(temp[,colnames(temp) %in% nm])))
# a d e f g c
#[1,] 5 11 8 6 7 8
#[2,] 50 110 80 60 70 80
#[3,] 500 1100 800 600 700 800
OR
temp = do.call(cbind, lapply(ms, function(x) as.data.frame(as.matrix(x))))
sapply(split.default(temp, unlist(sapply(ms, colnames))), rowSums)
# a c d e f g
#[1,] 5 8 11 8 6 7
#[2,] 50 80 110 80 60 70
#[3,] 500 800 1100 800 600 700
Upvotes: 1