Why does this line take so long to run?

Question

I have the following code, which gets a graph and set of ids to exclude, and returns the ids of nodes that do not appear in the nodes to exclude list.

I have two versions of the code. One that gets two lists, and the other that gets one list. I am using itertools.chain to combine the two lists.

from itertools import chain

def GrapMinusNodes(Graph,nodes_to_exclude1,nodes_to_exclude2):
    return (item.GetId() for item in Graph.Nodes() if item.GetId() not in chain(nodes_to_exclude1,nodes_to_exclude2))

and I have this one:

def GrapMinusNodes(Graph,nodes_to_exclude1,nodes_to_exclude2):
    return (item.GetId() for item in Graph.Nodes() if item.GetId() not in nodes_to_exclude1)

The first method runs 20% slower than the second one. What is the reason for that? Is there a way to make this code run faster?

tzaman · Accepted Answer

Why are you using chain here? Checking membership is O(n) for an iterable, and you have to recreate that iterable for each item you're checking. Instead, pre-create a set and test membership using that:

exclude = set().union(nodes_to_exclude1, nodes_to_exclude2)
return (item.GetId() for item in Graph.Nodes() if item.GetId() not in exclude)

Why does this line take so long to run?

Answers (1)

Related Questions