Reputation: 2333
I have a pandas dataframe with the following information:
Year NodeName NodeSize
1990 A 50
1990 B 10
1990 C 100
1995 A 90
1995 B 70
1995 C 60
2000 A 150
2000 B 90
2000 C 100
2005 A 55
2005 B 90
2005 C 130
I want the nodes to be placed in columns, such that every year is a column and every row is a node name, and the node size is reflective of the amount indicated.
I then have the following edges in a dataframe as follows:
FromYear ToYear FromNode ToNode EdgeWidth
1990 1995 A B 60
1990 1995 A C 20
1990 1995 B A 10
1990 1995 C B 10
1995 2000 A B 60
1995 2000 B A 30
1995 2000 C A 10
1995 2000 C B 10
1995 2000 B C 70
2000 2005 A B 10
2000 2005 A C 60
2000 2005 B A 60
2000 2005 B C 25
2000 2005 C B 44
2000 2005 C A 10
where the second dataframe represents information on the edges. For example in the first row, it's an arrow from node A under column 1990 to node B under column 1995, and the width of the edge is linear to the number in the Edgewidth column.
There seems to be a lot of tutorials on networkx, and would appreciate guidance.
Here is a rough sketch of how I would like it to look like. Each rows of nodes should also be a different color, if possible. I would like it to be some sort of an infographic than a typical network showing the flow between the nodes over years.
Here is the code to generate the two dataframes:
import pandas as pd
nodes = pd.DataFrame(
[(1990,'A',50),
(1990,'B',10),
(1990,'C',100),
(1995,'A',90),
(1995,'B',70),
(1995,'C',60),
(2000,'A',150),
(2000,'B',90),
(2000,'C',100),
(2005,'A',55),
(2005,'B',90),
(2005,'C',130)],
columns=['Year','NodeName','NodeSize'])
edges = pd.DataFrame(
[(1990,1995,'A','B',60),
(1990,1995,'A','C',20),
(1990,1995,'B','A',10),
(1990,1995,'C','B',10),
(1995,2000,'A','B',60),
(1995,2000,'B','A',30),
(1995,2000,'C','A',10),
(1995,2000,'C','B',10),
(1995,2000,'B','C',70),
(2000,2005,'A','B',10),
(2000,2005,'A','C',60),
(2000,2005,'B','A',60),
(2000,2005,'B','C',25),
(2000,2005,'C','B',44),
(2000,2005,'C','A',10)],
columns = ['FromYear','ToYear','FromNode','ToNode','EdgeWidth'])
Upvotes: 0
Views: 82
Reputation: 13031
Really quite straightforward. Convert NodeName
s to y-corrdinates, convert Year
s to x-coordinates, and then plot a bunch of Circle
and FancyArrow
patches.
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.patches import Circle, FancyArrow
nodes = pd.DataFrame(
[(1990,'A',50),
(1990,'B',10),
(1990,'C',100),
(1995,'A',90),
(1995,'B',70),
(1995,'C',60),
(2000,'A',150),
(2000,'B',90),
(2000,'C',100),
(2005,'A',55),
(2005,'B',90),
(2005,'C',130)],
columns=['Year','NodeName','NodeSize'])
edges = pd.DataFrame(
[(1990,1995,'A','B',60),
(1990,1995,'A','C',20),
(1990,1995,'B','A',10),
(1990,1995,'C','B',10),
(1995,2000,'A','B',60),
(1995,2000,'B','A',30),
(1995,2000,'C','A',10),
(1995,2000,'C','B',10),
(1995,2000,'B','C',70),
(2000,2005,'A','B',10),
(2000,2005,'A','C',60),
(2000,2005,'B','A',60),
(2000,2005,'B','C',25),
(2000,2005,'C','B',44),
(2000,2005,'C','A',10)],
columns = ['FromYear','ToYear','FromNode','ToNode','EdgeWidth'])
# compute node coordinates: year -> x, letter -> y;
# np.unique(z, return_inverse=True) maps the unique and alphanumerically
# ordered elements in z to consecutive integers,
# and returns the result as a second output argument
nodes['x'] = np.unique(nodes['Year'], return_inverse=True)[1]
nodes['y'] = np.unique(nodes['NodeName'], return_inverse=True)[1]
# A should be on top, C on bottom
nodes['y'] = np.max(nodes['y']) - nodes['y']
# Year NodeName NodeSize x y
# 0 1990 A 50 0 2
# 1 1990 B 10 0 1
# 2 1990 C 100 0 0
# 3 1995 A 90 1 2
# 4 1995 B 70 1 1
# 5 1995 C 60 1 0
# 6 2000 A 150 2 2
# 7 2000 B 90 2 1
# 8 2000 C 100 2 0
# 9 2005 A 55 3 2
# 10 2005 B 90 3 1
# 11 2005 C 130 3 0
# compute edge paths
edges = pd.merge(edges, nodes, how='inner', left_on=['FromYear', 'FromNode'], right_on=['Year', 'NodeName'])
edges = pd.merge(edges, nodes, how='inner', left_on=['ToYear', 'ToNode'], right_on=['Year', 'NodeName'], suffixes=['_start', '_stop'])
# FromYear ToYear FromNode ToNode EdgeWidth Year_start NodeName_start NodeSize_start x_start y_start Year_stop NodeName_stop NodeSize_stop x_stop y_stop
# 0 1990 1995 A B 60 1990 A 50 0 2 1995 B 70 1 1
# 1 1990 1995 C B 10 1990 C 100 0 0 1995 B 70 1 1
# 2 1990 1995 A C 20 1990 A 50 0 2 1995 C 60 1 0
# 3 1990 1995 B A 10 1990 B 10 0 1 1995 A 90 1 2
# 4 1995 2000 A B 60 1995 A 90 1 2 2000 B 90 2 1
# 5 1995 2000 C B 10 1995 C 60 1 0 2000 B 90 2 1
# 6 1995 2000 B A 30 1995 B 70 1 1 2000 A 150 2 2
# 7 1995 2000 C A 10 1995 C 60 1 0 2000 A 150 2 2
# 8 1995 2000 B C 70 1995 B 70 1 1 2000 C 100 2 0
# 9 2000 2005 A B 10 2000 A 150 2 2 2005 B 90 3 1
# 10 2000 2005 C B 44 2000 C 100 2 0 2005 B 90 3 1
# 11 2000 2005 A C 60 2000 A 150 2 2 2005 C 130 3 0
# 12 2000 2005 B C 25 2000 B 90 2 1 2005 C 130 3 0
# 13 2000 2005 B A 60 2000 B 90 2 1 2005 A 55 3 2
# 14 2000 2005 C A 10 2000 C 100 2 0 2005 A 55 3 2
fig, ax = plt.subplots()
rescale_by = 1./600 # trial and error
# draw edges first
for _, edge in edges.iterrows():
x, y = edge[['x_start', 'y_start']]
dx, dy = edge[['x_stop', 'y_stop']].values - edge[['x_start', 'y_start']].values
ax.add_patch(FancyArrow(x, y, dx, dy, width=rescale_by*edge['EdgeWidth'], length_includes_head=True, color='orange'))
# draw nodes second such that they are plotted on top of edges
for _, node in nodes.iterrows():
ax.add_patch(Circle(node[['x', 'y']], rescale_by*node['NodeSize'], facecolor='w', edgecolor='k'))
ax.text(node['x'], node['y'], node['NodeSize'], ha='center', va='center')
# annotate nodes
for _, node in nodes[['NodeName', 'y']].drop_duplicates().iterrows():
ax.text(-0.5, node['y'], node['NodeName'], fontsize=15, fontweight='bold', ha='center', va='center')
for _, node in nodes[['Year', 'x']].drop_duplicates().iterrows():
ax.text(node['x'], -0.5, node['Year'], fontsize=15, fontweight='bold', ha='center', va='center')
# adjust axis limits to include labels
ax.autoscale_view()
_, xmax = ax.get_xlim()
ax.set_xlim(-1, xmax)
# style axis
ax.set_aspect('equal')
ax.axis('off')
plt.show()
Upvotes: 1