Moody
Moody

Reputation: 833

MemoryError while counting edges in graph using Networkx

My initial goal was to do some structural property analysis (diameter, clustering coefficient etc.) using Networkx. However, I stumbled already by simply trying to count how many edges there are present in the given graph. This graph, which can be downloaded from over here (beware: 126 MB zip file) consists of 1,632,803 nodes and 30,622,564 edges. Please note, if you want to download this file, make sure to remove the comments from it (including the #) which are placed on top of the file

I have 8 GB of memory in my machine. Are my plans (diameter/clustering coefficient) too ambitious for a graph of this size? I hope not, because I like networkx due to its simplicity and it just seems complete.. If it is ambitious however, could you please advice another library that I can use for this job?

import networkx as nx

graph = nx.Graph()
graph.to_directed()

def create_undirected_graph_from_file(path, graph):
    for line in open(path):
        edges = line.rstrip().split()
        graph.add_edge(edges[0], edges[1])

print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())

Error:

Traceback (most recent call last):
  File "C:/Users/USER/PycharmProjects/untitled/main.py", line 12, in <module>
    print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
  File "C:/Users/User/PycharmProjects/untitled/main.py", line 8, in create_undirected_graph_from_file
    edges = line.rstrip().split()
MemoryError

Upvotes: 2

Views: 490

Answers (1)

Eric Pauley
Eric Pauley

Reputation: 1779

One potential problem is that strings have a large memory footprint. Since all of your edges are integers you can benefit by converting them to ints before creating the edges. You'll benefit from faster tracking internally and also have a lower memory footprint! Specifically:

def create_undirected_graph_from_file(path, graph):
    for line in open(path):
        a, b = line.rstrip().split()
        graph.add_edge(int(a), int(b))
    return graph

I'd recommend also changing your open to use contexts and ensure the file gets opened:

def create_undirected_graph_from_file(path, graph):
    with open(path) as f:
        for line in f:
            a, b = line.rstrip().split()
            graph.add_edge(int(a), int(b))
    return graph

Or the magic one-liner:

def create_undirected_graph_from_file(path, graph):
    with open(path) as f:
        [graph.add_edge(*(int(point) for point in line.rstrip().split())) for line in f]
    return graph

One more thing to keep in mind. Graph.to_directed returns a new graph. So be sure you set graph to the result of this instead of throwing out the result.

Upvotes: 2

Related Questions