Tochoka
Tochoka

Reputation: 1

From sparsematrix to dataframe

I have an adjacency sparse matrix M of size 12 000 X 12 000 in R and I would like to transfer it to another software. I am constrained to convert it to a 3 columns data.frame with col1 being the name of the col of my matrix, col2 the name of the row of my matrix and col3 the value M[i,j]. I only want to create an entry in the data.frame if M[i,j] is not 0 (keeping the logic of the sparse matrix).

I have seen a lot of questions asking how to do the opposite action, so I guess it is not that complicated but I can't find how to do this efficiently.

Thanks for your help

Upvotes: 0

Views: 1062

Answers (2)

Hong Ooi
Hong Ooi

Reputation: 57697

First, I'm going to assume that you have a regular sparse matrix, as created via the Matrix package. That is, the nonzero entries are encoded in terms of their values, columns, and row offsets.

The Matrix package has an alternate representation of a sparse matrix as a set of triplets, where the nonzero values are encoded in terms of their coordinates. This is basically what you want. Converting to this form is easy, as it turns out; and then you can turn it into a data frame.

One wart is that the coordinates are zero-based (ie, elements in the first row are encoded as row 0), which you may or may not want to convert to one-based.

library(Matrix)
# some sample data
m <- rsparsematrix(12000, 12000, 1e-7)

# convert to triplet form
mm <- as(m, "dgTMatrix")

# convert to data frame: convert to 1-based indexing
data.frame(i=mm@i + 1, j=mm@j + 1, x=mm@x)

#       i     j     x
#1    144   624  0.16
#2   3898  1106 -1.80
#3  11444  1395  0.89
#4   3981  2300  0.27
#5   3772  3602 -0.42
#6   2674  4058  0.79
#7   4446  4943  0.58
#8   4550  6629  0.82
#9   4125  6867 -0.86
#10  3151  7865 -0.42
#11 11590  8019 -0.96
#12  4808  9428 -1.30
#13 10453 11141  0.39
#14 11112 11592 -1.40

If you want the row/column names as opposed to numbers:

data.frame(i=rownames(mm)[mm@i + 1], j=colnames(mm)[mm@j + 1], x=mm@x)

Upvotes: 3

John Coleman
John Coleman

Reputation: 52008

Under the hood, a matrix is just a vector. You could use which to get the vector-indices of the nonzero items and then do some modular arithmetic to reconstruct the indices:

set.seed(123)
M <- matrix(sample(0:2,12,replace = TRUE,prob = c(0.8,0.1,0.1)),nrow = 3)
v <- which(M != 0)
rows <- 1 + (v-1) %% nrow(M)
cols <- 1 + (v-1) %/% nrow(M)
nonzeros <- data.frame(i=rows,j=cols,item=M[v])

In this example:

> M
     [,1] [,2] [,3] [,4]
[1,]    0    2    0    0
[2,]    0    1    2    1
[3,]    0    0    0    0
> nonzeros
  i j item
1 1 2    2
2 2 2    1
3 2 3    2
4 2 4    1

Upvotes: -1

Related Questions