Taie
Taie

Reputation: 1189

Getting memory error for graph clustering even for 128 GB of memory. Why?

I am using python language on a Linux server with 128 GB memory. I am doing graph clustering using Markov algorithm. The details of the process are as follows:

Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, 'source','target', edge_attr='weight', create_using=Graphtype)

Graph details:

Name: 
Type: Graph
Number of nodes: 4533801
Number of edges: 10548751
Average degree:   4.6534

Is the graph connected?

nx.is_connected(G)
False

Number of connected components

print(nx.number_connected_components(G))
7254

Markov Clustering

import markov_clustering as mc
import networkx as nx

matrix = nx.to_scipy_sparse_matrix(Gc) # build the matrix
result = mc.run_mcl(matrix)            # run MCL with default parameters
clusters = mc.get_clusters(result)     # get clusters

MemoryError

enter image description here

Why am I still getting a memory error message when trying to extract the clusters? What is the issue? How can I go around this?

UPDATE:

Reporting results taking into account the comments given. enter image description here

Upvotes: 3

Views: 639

Answers (1)

houseofleft
houseofleft

Reputation: 447

I'd assume from your code that you're using 32 bit Python, which means that regardless of hardware, you won't be able to make use of more that 4GB of RAM.

Upgrading to a 64 bit Python will let you use up to 16EB of RAM which will allow you to use the additional space you have on a server.

You could save some memory by not storing unnecessary variables, and letting python clear out discarded information. From what I can see in these lines of code:

matrix = nx.to_scipy_sparse_matrix(Gc) # build the matrix
result = mc.run_mcl(matrix)            # run MCL with default parameters
clusters = mc.get_clusters(result)     # get clusters

The 'matrix' and 'result' variables are only used to arrive at 'clusters' so don't theoretically need to be saved. This code should give python permission to clear up a bit of memory:

clusters = mc.get_clusters(mc.run_mcl(nx.to_scipy_sparse_matric(Gc)))

Obviously, you're sacrificing legibility and elegance of code and it's unlikely that this will free up enough space to solve your issue, but it's worth pointing your attention to just in case.

Upvotes: 3

Related Questions