Why is Networkx consuming all my memory?

Question

Using Networkx on Python 2.7, I've been trying to build a graph of about 2M users and 880M edges. I'm using a text file of about 17Gb containing the edges list. I've tried to use the function nx.read_edgelist(), but after using about 250Gb or ram (I'm working on a distant server), my program get killed.

My question is: is it normal that networkx is using that much memory? Or maybe I made a mistake collecting the data? I've been thinking of using another library, and I found both iGraph and graph-tool which look pretty efficient. Would anyone have any advice on that?

Thank you!

EDIT : my file actually contains 880M edges, not 88M

Joel · Accepted Answer

(I'm not 100% sure of this, but no-one else has answered yet, so I'll give it a go).

First, each edge is saved twice (once for each node), so the memory can quickly grow. However, that's probably not your biggest problem

It is likely that your node names are all integers. However, if you haven't told read_edgelist that they are ints, they'll be treated as strings. The memory used for a string is huge compared to an int. Here's how to call read_edgelist:

read_edgelist(path, comments='#', delimiter=None, create_using=None, nodetype=None, data=True, edgetype=None, encoding='utf-8')

If this is your problem, it'll be fixed by using

G=nx.read_edgelist(path, nodetype = int)

Why is Networkx consuming all my memory?

Answers (1)

Related Questions