Reputation: 211
Hi i am trying to generate an adjacency matrix with a dimension of about 24,000 from a CSV with two columns showing combinations of pairs of genes and a column of 1's to indicate a present interaction....My goal is to have it be square and populated with zeros for combinations not in the two columns
I am using the following Python script
import numpy as np
from scipy.sparse import coo_matrix
l, c, v = np.loadtxt("biogrid2.csv", dtype=(int), skiprows=0, delimiter=",").T[:3, :]
m =coo_matrix((l, (v-1, c-1)), shape=(v.max(), c.max()))
m.toarray()
and it runs ok until encountering the following errorIt seems
File "/home/charlie/anaconda3/lib/python3.6/site-packages/scipy/sparse/base.py", line 1184, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
Any ideas about how to get around the memory limit in Scipy
Thanks
Upvotes: 1
Views: 756
Reputation: 14399
Most likely what you want isn't m.toarray
but m.tocsr()
. a csr
matrix can do simple linear algebra (like .dot()
and matrix powers) natively, for instance this works:
m.tocsr()
random_walk_2 = m.dot(m)
random_walk_n = m ** n
# see https://stackoverflow.com/questions/28702416/matrix-power-for-sparse-matrix-in-python
Covariance should be implementable as well, but I'm not sure what the specific implementation would be without seeing what your current process is.
EDIT: To turn the output back into a simpler format to read out to csv
, you can follow up by returning to coo
with .tocoo()
m.tocoo()
out = np.c_[m.data, m.row, m.col].T
np.savetxt("foo.csv", out, delimiter=",")
# see https://stackoverflow.com/questions/6081008/dump-a-numpy-array-into-a-csv-file
Upvotes: 1
Reputation: 2699
The function toarray()
will convert your 24000*24000 sparse matrix (coo_matrix
) into a dense array of 24000*24000 (assuming you are loading int
) which needs in terms of memory at least
24000*24000*4 = around 2,15Gb.
To avoid using so much memory you should avoid converting to dense matrix (using toarray()
) and do your operations with sparse matrix
If you need your matrix squared you can just do m*m
or m.multiply(m)
and you will get a sparse matrix.
To save your matrix you have several option.
Simplest one is NPZ see https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.sparse.save_npz.html or Save / load scipy sparse csr_matrix in portable data format
If you want to get your result as your initial CSV file coo_matrix has attributes
data COO format data array of the matrix
row COO format row index array of the matrix
col COO format column index array of the matrix
see https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html
which can be used to create the CSV file.
Upvotes: 0