sedavidw
sedavidw

Reputation: 11741

Overhead in networkx reverse function?

I have the following code:

import networkx

def reverse_graph(g):
    reversed = networkx.DiGraph()
    for e in g.edges():
        reversed.add_edge(e[1], e[0])
    return reversed


g = networkx.DiGraph()

for i in range(500000):
    g.add_edge(i, i+1)

g2 = g.reverse()
g3 = reverse_graph(g)

And according to my line profiler I am spending WAY more time reversing the graph using networkx (their reverse took about 21 sec, mine took about 7). The overhead seems high in this simple case, and it's even worse in other code I have with more complex objects. Is there something happening under the hood of networkx I'm not aware of? This seems like it should be a relatively cheap function.

For reference, here is the doc for the reverse function

EDIT: I also tried running the implementations the other way around (i.e. mine first) to make sure there was no cacheing happening when they created theirs. Mine was still significantly faster

Upvotes: 2

Views: 418

Answers (1)

unutbu
unutbu

Reputation: 880429

The source code for the reverse method looks like this:

def reverse(self, copy=True):
    """Return the reverse of the graph.

    The reverse is a graph with the same nodes and edges
    but with the directions of the edges reversed.

    Parameters
    ----------
    copy : bool optional (default=True)
        If True, return a new DiGraph holding the reversed edges.
        If False, reverse the reverse graph is created using
        the original graph (this changes the original graph).
    """
    if copy:
        H = self.__class__(name="Reverse of (%s)"%self.name)
        H.add_nodes_from(self)
        H.add_edges_from( (v,u,deepcopy(d)) for u,v,d 
                          in self.edges(data=True) )
        H.graph=deepcopy(self.graph)
        H.node=deepcopy(self.node)
    else:
        self.pred,self.succ=self.succ,self.pred
        self.adj=self.succ
        H=self
    return H

So by default, when copy=True, not only are the edge nodes reversed, but also a deepcopy of any edge data is made. Then the graph attributes (held in self.graph) are deepcopied, and then the nodes themselves are deepcopied. That's a lot of copying that reverse_graph does not do.

If you don't deepcopy everything, modifying g3 could affect g.

If you don't need to deepcopy everything, (and if mutating g is acceptable) then

g.reverse(copy=False)

is even faster than

g3 = reverse_graph(g)

In [108]: %timeit g.reverse(copy=False)
1000000 loops, best of 3: 359 ns per loop

In [95]: %timeit reverse_graph(g)
1 loops, best of 3: 1.32 s per loop

In [96]: %timeit g.reverse()
1 loops, best of 3: 4.98 s per loop

Upvotes: 2

Related Questions