Reputation: 1189
I am using python language on a Linux server with 128 GB memory. I am doing graph clustering using Markov algorithm. The details of the process are as follows:
Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, 'source','target', edge_attr='weight', create_using=Graphtype)
Name:
Type: Graph
Number of nodes: 4533801
Number of edges: 10548751
Average degree: 4.6534
nx.is_connected(G)
False
print(nx.number_connected_components(G))
7254
import markov_clustering as mc
import networkx as nx
matrix = nx.to_scipy_sparse_matrix(Gc) # build the matrix
result = mc.run_mcl(matrix) # run MCL with default parameters
clusters = mc.get_clusters(result) # get clusters
MemoryError
Why am I still getting a memory error message when trying to extract the clusters? What is the issue? How can I go around this?
UPDATE:
Reporting results taking into account the comments given.
Upvotes: 3
Views: 639
Reputation: 447
I'd assume from your code that you're using 32 bit Python, which means that regardless of hardware, you won't be able to make use of more that 4GB of RAM.
Upgrading to a 64 bit Python will let you use up to 16EB of RAM which will allow you to use the additional space you have on a server.
You could save some memory by not storing unnecessary variables, and letting python clear out discarded information. From what I can see in these lines of code:
matrix = nx.to_scipy_sparse_matrix(Gc) # build the matrix
result = mc.run_mcl(matrix) # run MCL with default parameters
clusters = mc.get_clusters(result) # get clusters
The 'matrix' and 'result' variables are only used to arrive at 'clusters' so don't theoretically need to be saved. This code should give python permission to clear up a bit of memory:
clusters = mc.get_clusters(mc.run_mcl(nx.to_scipy_sparse_matric(Gc)))
Obviously, you're sacrificing legibility and elegance of code and it's unlikely that this will free up enough space to solve your issue, but it's worth pointing your attention to just in case.
Upvotes: 3