Reputation: 833
My initial goal was to do some structural property analysis (diameter, clustering coefficient etc.) using Networkx. However, I stumbled already by simply trying to count how many edges there are present in the given graph. This graph, which can be downloaded from over here (beware: 126 MB zip file) consists of 1,632,803 nodes and 30,622,564 edges. Please note, if you want to download this file, make sure to remove the comments from it (including the #) which are placed on top of the file
I have 8 GB of memory in my machine. Are my plans (diameter/clustering coefficient) too ambitious for a graph of this size? I hope not, because I like networkx due to its simplicity and it just seems complete.. If it is ambitious however, could you please advice another library that I can use for this job?
import networkx as nx
graph = nx.Graph()
graph.to_directed()
def create_undirected_graph_from_file(path, graph):
for line in open(path):
edges = line.rstrip().split()
graph.add_edge(edges[0], edges[1])
print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
Error:
Traceback (most recent call last):
File "C:/Users/USER/PycharmProjects/untitled/main.py", line 12, in <module>
print(create_undirected_graph_from_file("C:\\Users\\USER\\Desktop\\soc-pokec-relationships.txt", graph).g.number_of_edges())
File "C:/Users/User/PycharmProjects/untitled/main.py", line 8, in create_undirected_graph_from_file
edges = line.rstrip().split()
MemoryError
Upvotes: 2
Views: 490
Reputation: 1779
One potential problem is that strings have a large memory footprint. Since all of your edges are integers you can benefit by converting them to ints before creating the edges. You'll benefit from faster tracking internally and also have a lower memory footprint! Specifically:
def create_undirected_graph_from_file(path, graph):
for line in open(path):
a, b = line.rstrip().split()
graph.add_edge(int(a), int(b))
return graph
I'd recommend also changing your open
to use contexts and ensure the file gets opened:
def create_undirected_graph_from_file(path, graph):
with open(path) as f:
for line in f:
a, b = line.rstrip().split()
graph.add_edge(int(a), int(b))
return graph
Or the magic one-liner:
def create_undirected_graph_from_file(path, graph):
with open(path) as f:
[graph.add_edge(*(int(point) for point in line.rstrip().split())) for line in f]
return graph
One more thing to keep in mind. Graph.to_directed
returns a new graph. So be sure you set graph to the result of this instead of throwing out the result.
Upvotes: 2