Mikhail Belousov
Mikhail Belousov

Reputation: 115

How to get rid of duplicates in a graph

I'm building a social graph from a list of tuples 'friends' like this:

(4118181 {'last_name': 'Belousov', 'first_name': 'Mikhail'})

Here's the function:

def addToGraph (g, start, friends):
    g.add_nodes_from(friends)
    egdes_to_add = [(start, entry[0]) for entry in friends]
    g.add_edges_from(edges_to_add)
    return g

As a result I get a graph with duplicated amount of nodes, the 1st with attributes, coming from

g.add_nodes_from(friends)

and the second is from

 g.add_edges_from(edges_to_add)

I read the docs, but can't figure out, how can I add both nodes with attributes and edges between those nodes?

Upvotes: 0

Views: 2075

Answers (2)

Joel
Joel

Reputation: 23827

Your nodes are integers. Your edges are strings. When you add the nodes, it adds a bunch of nodes whose names are integers. When it adds an edge, it sees a new edge between the strings '4118181'and '340559596'. Python sees those as distinct from the integers, so it creates new nodes with the new names and puts an edge between them.

To fix this, you'll need to convert the strings to integers before adding the edges.

Upvotes: 1

edo
edo

Reputation: 1869

So your function adds edges between the node start and every node in friends. I tried your code and I don't get any duplicate nodes. Here is my full example (note that I corrected a couple of errors in your code).

import networkx as nx

friends = [
    (4118181, {'last_name': 'Belousov', 'first_name': 'Mikhail'}),
    (1111111, {'last_name': 'A', 'first_name': 'B'}),
    (2222222, {'last_name': 'C', 'first_name': 'D'}),
    (3333333, {'last_name': 'E', 'first_name': 'F'})
]

def addToGraph(g, start, friends):
    g.add_nodes_from(friends)
    edges_to_add = [(start, entry[0]) for entry in friends]
    g.add_edges_from(edges_to_add)

G = nx.Graph()
addToGraph(G, 4118181, friends)

print('Nodes:', G.nodes())
print('Edges:', G.edges())

Output:

Nodes: [3333333, 4118181, 2222222, 1111111]
Edges: [(3333333, 4118181), (4118181, 4118181), (4118181, 2222222), (4118181, 1111111)]

Upvotes: 0

Related Questions