Reputation: 1189
I am doing graph clustering with python. The algorithm requires that the data passed from graph G
should be adjacency-matrix. However, in order to get adjacency-matrix
as numpy-array
like this:
import networkx as nx
matrix = nx.to_numpy_matrix(G)
I get a memory error. The message is MemoryError: Unable to allocate 2.70 TiB for an array with shape (609627, 609627) and data type float64
However, my device is new (Lenovo E490), windows 64 bit, memory 8 Gb
Other important information could be:
Number of nodes: 609627
Number of edges: 915549
Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, 'source','target', edge_attr='weight', create_using=Graphtype)
import markov_clustering as mc
import networkx as nx
matrix = nx.to_scipy_sparse_matrix(G) # build the matrix
result = mc.run_mcl(matrix) # run MCL with default parameters
MemoryError
Upvotes: 2
Views: 838
Reputation: 12407
The matrix you are trying to create is of size 609627x609627
of float64. With each float64 using 8 bytes of memory, you will need 609627*609627*8~3TB
memory. Well your system has only 8GB and even with added physical memory, 3TB seems too large to operate. Assuming your node ids are integer, you can use dtype=unit4
(to account for all 609627
nodes) but it still will need over TB of memory which sounds inaccessible. What is it that you are trying to do, seems like you have a sparse matrix and you can probably have another possible approach to your goal. The adjacency matrix (unless compressed) seems hard to achieve.
Maybe you can benefit of something like:
to_scipy_sparse_matrix(G, nodelist=None, dtype=None, weight='weight', format='csr')
in networks
package. Or rather use edgelist to calculate whatever you are trying to achieve.
Upvotes: 2