Networkx: how to optimize building a graph

Question

I use networkx for building a graph which consists of keywords extracted from the texts. I tried 2 approaches and I noticed one strange thing. The first approach is

def build_graph(all_nodes):
    G = nx.Graph()
    for nodes in all_nodes:
        for pair_of_nodes in itertools.combinations(set(nodes), 2):
            u, v = pair_of_nodes
            if G.has_edge(u, v):
                G[u][v]['weight'] += 1
            else:
                G.add_edge(u, v, weight=1)
    return G

and the second one is

def build_graph_2(all_nodes):
    nodes_weighted = Counter()
    for nodes in all_nodes:
        nodes_weighted.update(itertools.combinations(set(nodes), 2))
    return nx.Graph((x, y, {'weight': v}) for (x, y), v in nodes_weighted.items())

For testing I generated a random sample like [[text_1_kw_1, text_1_kw_2, text_1_kw_3, ...], [text_2_kw_1, text_2_kw_2, ...], ...]

texts = [[random.randint(0, 100) for _ in range(100)] for _ in range(500)]
texts = [list(map(str, text)) for text in texts]

Here the first approach gave

713 ms ± 27.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And the second one

135 ms ± 9.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The second approach looks more attractive.

But length of a keyword usually is more than 10 chars. So here I make a new one test sample.

texts = [[random.randint(10000, 10000000000) for _ in range(100)] for _ in range(500)]
texts = [list(map(str, text)) for text in texts]

However here we get the following result for the first approach

3.04 s ± 91.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And for the second is

3.56 s ± 332 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The superiority of the second approach over the first decreased. I don't fully understand why does it happen and how can I build a graph faster than using the first approach?

Networkx: how to optimize building a graph

Answers (1)

Related Questions