Reputation: 1
I have an adjacency sparse matrix M of size 12 000 X 12 000 in R and I would like to transfer it to another software. I am constrained to convert it to a 3 columns data.frame with col1 being the name of the col of my matrix, col2 the name of the row of my matrix and col3 the value M[i,j]. I only want to create an entry in the data.frame if M[i,j] is not 0 (keeping the logic of the sparse matrix).
I have seen a lot of questions asking how to do the opposite action, so I guess it is not that complicated but I can't find how to do this efficiently.
Thanks for your help
Upvotes: 0
Views: 1062
Reputation: 57697
First, I'm going to assume that you have a regular sparse matrix, as created via the Matrix package. That is, the nonzero entries are encoded in terms of their values, columns, and row offsets.
The Matrix package has an alternate representation of a sparse matrix as a set of triplets, where the nonzero values are encoded in terms of their coordinates. This is basically what you want. Converting to this form is easy, as it turns out; and then you can turn it into a data frame.
One wart is that the coordinates are zero-based (ie, elements in the first row are encoded as row 0
), which you may or may not want to convert to one-based.
library(Matrix)
# some sample data
m <- rsparsematrix(12000, 12000, 1e-7)
# convert to triplet form
mm <- as(m, "dgTMatrix")
# convert to data frame: convert to 1-based indexing
data.frame(i=mm@i + 1, j=mm@j + 1, x=mm@x)
# i j x
#1 144 624 0.16
#2 3898 1106 -1.80
#3 11444 1395 0.89
#4 3981 2300 0.27
#5 3772 3602 -0.42
#6 2674 4058 0.79
#7 4446 4943 0.58
#8 4550 6629 0.82
#9 4125 6867 -0.86
#10 3151 7865 -0.42
#11 11590 8019 -0.96
#12 4808 9428 -1.30
#13 10453 11141 0.39
#14 11112 11592 -1.40
If you want the row/column names as opposed to numbers:
data.frame(i=rownames(mm)[mm@i + 1], j=colnames(mm)[mm@j + 1], x=mm@x)
Upvotes: 3
Reputation: 52008
Under the hood, a matrix is just a vector. You could use which
to get the vector-indices of the nonzero items and then do some modular arithmetic to reconstruct the indices:
set.seed(123)
M <- matrix(sample(0:2,12,replace = TRUE,prob = c(0.8,0.1,0.1)),nrow = 3)
v <- which(M != 0)
rows <- 1 + (v-1) %% nrow(M)
cols <- 1 + (v-1) %/% nrow(M)
nonzeros <- data.frame(i=rows,j=cols,item=M[v])
In this example:
> M
[,1] [,2] [,3] [,4]
[1,] 0 2 0 0
[2,] 0 1 2 1
[3,] 0 0 0 0
> nonzeros
i j item
1 1 2 2
2 2 2 1
3 2 3 2
4 2 4 1
Upvotes: -1