Reputation: 875
I have a Networkx graph
called G
created below:
import networkx as nx
G = nx.Graph()
G.add_node(1,job= 'teacher', boss = 'dee')
G.add_node(2,job= 'teacher', boss = 'foo')
G.add_node(3,job= 'admin', boss = 'dee')
G.add_node(4,job= 'admin', boss = 'lopez')
I would like to store the node
number along with attributes
, job
and boss
in separate columns of a pandas
dataframe
.
I have attempted to do this with the below code but it produces a dataframe
with 2 columns, 1 with node
number and one with all of the attributes
:
graph = G.nodes(data = True)
import pandas as pd
df = pd.DataFrame(graph)
df
Out[19]:
0 1
0 1 {u'job': u'teacher', u'boss': u'dee'}
1 2 {u'job': u'teacher', u'boss': u'foo'}
2 3 {u'job': u'admin', u'boss': u'dee'}
3 4 {u'job': u'admin', u'boss': u'lopez'}
Note: I acknowledge that NetworkX
has a to_pandas_dataframe
function but it does not provide a dataframe
with the output I am looking for.
Upvotes: 13
Views: 10477
Reputation: 79
I have solved this with a dictionary comprehension
.
d = {n:dag.nodes[n] for n in dag.nodes}
df = pd.DataFrame.from_dict(d, orient='index')
Your dictionary d
maps the nodes n
to dag.nodes[n]
.
Each value of that dictionary dag.nodes[n]
is a dictionary itself and contains all attributes: {attribute_name:attribute_value}
So your dictionary d
has the form:
{node_id : {attribute_name : attribute_value} }
The advantage I see is that you do not need to know the names of your attributes.
If you wanted to have the node-IDs not as index but in a column, you could add as the last command:
df.reset_index(drop=False, inplace=True)
Upvotes: 0
Reputation: 6980
I think this is even simpler:
pandas.DataFrame.from_dict(graph.nodes, orient='index')
Without having to convert to another dict.
Upvotes: 6
Reputation: 476
Here's a one-liner.
pd.DataFrame.from_dict(dict(graph.nodes(data=True)), orient='index')
Upvotes: 34
Reputation: 104
I updated this solution to work with my updated version of NetworkX (2.0) and thought I would share. I also had the function return a Pandas DataFrame.
def nodes_to_df(graph):
import pandas as pd
data={}
data['node']=[x[0] for x in graph.nodes(data=True)]
other_cols = graph.nodes[0].keys()
for key in other_cols:
data[key] = [x[1][key] for x in graph.nodes(data=True)]
return pd.DataFrame(data)
Upvotes: 1
Reputation: 393903
I don't know how representative your data is but it should be straightforward to modify my code to work on your real network:
In [32]:
data={}
data['node']=[x[0] for x in graph]
data['boss'] = [x[1]['boss'] for x in graph]
data['job'] = [x[1]['job'] for x in graph]
df1 = pd.DataFrame(data)
df1
Out[32]:
boss job node
0 dee teacher 1
1 foo teacher 2
2 dee admin 3
3 lopez admin 4
So here all I'm doing is constructing a dict from the graph data, pandas accepts dicts as data where the keys are the column names and the data has to be array-like, in this case lists of values
A more dynamic method:
In [42]:
def func(graph):
data={}
data['node']=[x[0] for x in graph]
other_cols = graph[0][1].keys()
for key in other_cols:
data[key] = [x[1][key] for x in graph]
return data
pd.DataFrame(func(graph))
Out[42]:
boss job node
0 dee teacher 1
1 foo teacher 2
2 dee admin 3
3 lopez admin 4
Upvotes: 2