DocZerø
DocZerø

Reputation: 8557

Unable to generate network data in tree format while retaining node attributes

I'm trying to generate an network graph that visualises data lineage (cluster graph such as this). Please keep in mind that I'm very new to the NetworkX library and that the code below might be far from optimal.

My data consist of 2 Pandas dataframes:

Here's what I do to initialise the directed graph and create the nodes:

import networkx as nx
objs = df_objs.set_index('uuid').to_dict(orient='index')
g = nx.DiGraph()
for obj_id, obj_attrs in objs.items():
    g.add_node(obj_id, attr_dict=obj_attrs)

And to generate the edges:

g.add_edges_from(df_calls.drop_duplicates().to_dict(orient='split')['data'])

Next, I want to know the lineage of a single item using their UUID:

g_tree = nx.DiGraph(nx.bfs_edges(g, 'f6e214b1bba34a01bd0c18f232d6aee2', reverse=True))

So far so good. The last step is to generate the JSON graph so that I can feed the resulting JSON file to D3.js in order to perform the visualisation:

# Create the JSON data structure
from networkx.readwrite import json_graph
data = json_graph.tree_data(g_tree, root='f6e214b1bba34a01bd0c18f232d6aee2')
# Write the tree to a JSON file
import json
with open('./tree.json', 'w') as f:
    json.dump(data, f)

All of the above works, however, instead of the node names, I'm left with the UUID in the JSON data, due to the node attributes being dropped in the call to nx.bfs_edges().

Example:

Tree example

Not a problem (at least that's what I thought); I'll just update the nodes in the g_tree with the attributes from g.

obj_names = nx.get_node_attributes(g, 'name')
for obj_id, obj_name in obj_names.items():
    try:
        g_tree[obj_id]['name'] = obj_name
    except Exception:
        pass

Note: I can't use set_node_attributes() as g contains more nodes than g_tree, which causes a KeyError.

If I then try to generate the JSON data again:

data = json_graph.tree_data(g_tree, root='f6e214b1bba34a01bd0c18f232d6aee2')

it will throw the error:

TypeError: G is not a tree.

This is due to number of nodes != number of edges + 1.

Before setting the attributes, the number of nodes was 81 and the number of edges 80. After setting the attributes, the number of edges increased to 120 (number of nodes remained the same).

OK, as for my questions:

  1. Am I taking the long way around and is there a much shorter/better/faster way to generate the same result?
  2. What is causing the number of edges to increase when I'm only setting the attributes for nodes?
  3. Is there a way to retain the node attributes when trying to generate the tree?

Upvotes: 0

Views: 925

Answers (1)

unutbu
unutbu

Reputation: 879093

Per the warning in the docs regarding the dict G[node],

Do not change the returned dict – it is part of the graph data structure and direct manipulation may leave the graph in an inconsistent state.

Thus, assignment to g_tree[obj_id] is a no-no:

g_tree[obj_id]['name'] = obj_name

Instead use G.node to modify attributes:

g_tree.node[obj_id]['name'] = obj_name

Also, once you have g_tree, you can obtain a list of the nodes in g_tree with

In [220]: g_tree.nodes()
Out[220]: ['A', 'C', 'B']

and then you can use

for obj_id in g_tree.nodes():
    g_tree.node[obj_id] = g.node[obj_id]

to copy the attributes from g to g_tree.


import json
import pandas as pd
import networkx as nx
from networkx.readwrite import json_graph

df_objs = pd.DataFrame({'uuid':list('ABCD'), 'name':['foo','bar','baz','quux']})
df_calls = pd.DataFrame({'calling':['A','A'], 'called':['B','C']})
objs = df_objs.set_index('uuid').to_dict(orient='index')
g = nx.DiGraph()
g.add_nodes_from(objs.items())
g.add_edges_from(df_calls[['calling','called']].drop_duplicates().values)

g_tree = nx.DiGraph(nx.bfs_edges(g, 'A'))

for obj_id in g_tree.nodes():
    g_tree.node[obj_id] = g.node[obj_id]

print(g_tree.nodes(data=True))
# [('A', {'name': 'foo'}), ('C', {'name': 'baz'}), ('B', {'name': 'bar'})]

data = json_graph.tree_data(g_tree, root='A')
print(json.dumps(data))
# {"children": [{"name": "baz", "id": "C"}, {"name": "bar", "id": "B"}], 
#  "name": "foo", "id": "A"}

Upvotes: 1

Related Questions