How to handle memory errors with adjacency matrix?

Question

I am doing graph clustering with python. The algorithm requires that the data passed from graph G should be adjacency-matrix. However, in order to get adjacency-matrix as numpy-array like this:

import networkx as nx
matrix = nx.to_numpy_matrix(G)

I get a memory error. The message is MemoryError: Unable to allocate 2.70 TiB for an array with shape (609627, 609627) and data type float64

However, my device is new (Lenovo E490), windows 64 bit, memory 8 Gb

Other important information could be:

Number of nodes: 609627
Number of edges: 915549

The entire story is as follows:

Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, 'source','target', edge_attr='weight', create_using=Graphtype)

Markov Clustering

import markov_clustering as mc
import networkx as nx

matrix = nx.to_scipy_sparse_matrix(G) # build the matrix
result = mc.run_mcl(matrix)            # run MCL with default parameters

MemoryError

Ehsan · Accepted Answer

The matrix you are trying to create is of size 609627x609627 of float64. With each float64 using 8 bytes of memory, you will need 609627*609627*8~3TB memory. Well your system has only 8GB and even with added physical memory, 3TB seems too large to operate. Assuming your node ids are integer, you can use dtype=unit4(to account for all 609627 nodes) but it still will need over TB of memory which sounds inaccessible. What is it that you are trying to do, seems like you have a sparse matrix and you can probably have another possible approach to your goal. The adjacency matrix (unless compressed) seems hard to achieve.

Maybe you can benefit of something like:

to_scipy_sparse_matrix(G, nodelist=None, dtype=None, weight='weight', format='csr')

in networks package. Or rather use edgelist to calculate whatever you are trying to achieve.

How to handle memory errors with adjacency matrix?

The entire story is as follows:

Markov Clustering

Answers (1)

Related Questions