Question on how to draw a customized network in python

Question

I have a pandas dataframe with the following information:

Year   NodeName   NodeSize
1990   A          50
1990   B          10
1990   C          100
1995   A          90
1995   B          70
1995   C          60
2000   A          150
2000   B          90
2000   C          100
2005   A          55
2005   B          90
2005   C          130

I want the nodes to be placed in columns, such that every year is a column and every row is a node name, and the node size is reflective of the amount indicated.

I then have the following edges in a dataframe as follows:

FromYear ToYear  FromNode    ToNode   EdgeWidth
1990     1995    A           B        60   
1990     1995    A           C        20   
1990     1995    B           A        10   
1990     1995    C           B        10   
1995     2000    A           B        60   
1995     2000    B           A        30   
1995     2000    C           A        10   
1995     2000    C           B        10   
1995     2000    B           C        70   
2000     2005    A           B        10
2000     2005    A           C        60
2000     2005    B           A        60
2000     2005    B           C        25
2000     2005    C           B        44
2000     2005    C           A        10

where the second dataframe represents information on the edges. For example in the first row, it's an arrow from node A under column 1990 to node B under column 1995, and the width of the edge is linear to the number in the Edgewidth column.

There seems to be a lot of tutorials on networkx, and would appreciate guidance.

Here is a rough sketch of how I would like it to look like. Each rows of nodes should also be a different color, if possible. I would like it to be some sort of an infographic than a typical network showing the flow between the nodes over years.

Here is the code to generate the two dataframes:

import pandas as pd

nodes = pd.DataFrame(
[(1990,'A',50),
(1990,'B',10),
(1990,'C',100),
(1995,'A',90),
(1995,'B',70),
(1995,'C',60),
(2000,'A',150),
(2000,'B',90),
(2000,'C',100),
(2005,'A',55),
(2005,'B',90),
(2005,'C',130)],
columns=['Year','NodeName','NodeSize'])

edges = pd.DataFrame(
[(1990,1995,'A','B',60), 
(1990,1995,'A','C',20),   
(1990,1995,'B','A',10),   
(1990,1995,'C','B',10),  
(1995,2000,'A','B',60),   
(1995,2000,'B','A',30),   
(1995,2000,'C','A',10),   
(1995,2000,'C','B',10),   
(1995,2000,'B','C',70),   
(2000,2005,'A','B',10),
(2000,2005,'A','C',60),
(2000,2005,'B','A',60),
(2000,2005,'B','C',25),
(2000,2005,'C','B',44),
(2000,2005,'C','A',10)],
columns = ['FromYear','ToYear','FromNode','ToNode','EdgeWidth'])

Paul Brodersen · Accepted Answer

Really quite straightforward. Convert NodeNames to y-corrdinates, convert Years to x-coordinates, and then plot a bunch of Circle and FancyArrow patches.

#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from matplotlib.patches import Circle, FancyArrow

nodes = pd.DataFrame(
    [(1990,'A',50),
     (1990,'B',10),
     (1990,'C',100),
     (1995,'A',90),
     (1995,'B',70),
     (1995,'C',60),
     (2000,'A',150),
     (2000,'B',90),
     (2000,'C',100),
     (2005,'A',55),
     (2005,'B',90),
     (2005,'C',130)],
    columns=['Year','NodeName','NodeSize'])

edges = pd.DataFrame(
    [(1990,1995,'A','B',60),
     (1990,1995,'A','C',20),
     (1990,1995,'B','A',10),
     (1990,1995,'C','B',10),
     (1995,2000,'A','B',60),
     (1995,2000,'B','A',30),
     (1995,2000,'C','A',10),
     (1995,2000,'C','B',10),
     (1995,2000,'B','C',70),
     (2000,2005,'A','B',10),
     (2000,2005,'A','C',60),
     (2000,2005,'B','A',60),
     (2000,2005,'B','C',25),
     (2000,2005,'C','B',44),
     (2000,2005,'C','A',10)],
    columns = ['FromYear','ToYear','FromNode','ToNode','EdgeWidth'])

# compute node coordinates: year -> x, letter -> y;
# np.unique(z, return_inverse=True) maps the unique and alphanumerically 
# ordered elements in z to consecutive integers,
# and returns the result as a second output argument
nodes['x'] = np.unique(nodes['Year'], return_inverse=True)[1]
nodes['y'] = np.unique(nodes['NodeName'], return_inverse=True)[1]

# A should be on top, C on bottom
nodes['y'] = np.max(nodes['y']) - nodes['y']

#     Year NodeName  NodeSize  x  y
# 0   1990        A        50  0  2
# 1   1990        B        10  0  1
# 2   1990        C       100  0  0
# 3   1995        A        90  1  2
# 4   1995        B        70  1  1
# 5   1995        C        60  1  0
# 6   2000        A       150  2  2
# 7   2000        B        90  2  1
# 8   2000        C       100  2  0
# 9   2005        A        55  3  2
# 10  2005        B        90  3  1
# 11  2005        C       130  3  0


# compute edge paths
edges = pd.merge(edges, nodes, how='inner', left_on=['FromYear', 'FromNode'], right_on=['Year', 'NodeName'])
edges = pd.merge(edges, nodes, how='inner', left_on=['ToYear', 'ToNode'],     right_on=['Year', 'NodeName'], suffixes=['_start', '_stop'])

#     FromYear  ToYear FromNode ToNode  EdgeWidth  Year_start NodeName_start  NodeSize_start  x_start  y_start  Year_stop NodeName_stop  NodeSize_stop  x_stop  y_stop
# 0       1990    1995        A      B         60        1990              A              50        0        2       1995             B             70       1       1
# 1       1990    1995        C      B         10        1990              C             100        0        0       1995             B             70       1       1
# 2       1990    1995        A      C         20        1990              A              50        0        2       1995             C             60       1       0
# 3       1990    1995        B      A         10        1990              B              10        0        1       1995             A             90       1       2
# 4       1995    2000        A      B         60        1995              A              90        1        2       2000             B             90       2       1
# 5       1995    2000        C      B         10        1995              C              60        1        0       2000             B             90       2       1
# 6       1995    2000        B      A         30        1995              B              70        1        1       2000             A            150       2       2
# 7       1995    2000        C      A         10        1995              C              60        1        0       2000             A            150       2       2
# 8       1995    2000        B      C         70        1995              B              70        1        1       2000             C            100       2       0
# 9       2000    2005        A      B         10        2000              A             150        2        2       2005             B             90       3       1
# 10      2000    2005        C      B         44        2000              C             100        2        0       2005             B             90       3       1
# 11      2000    2005        A      C         60        2000              A             150        2        2       2005             C            130       3       0
# 12      2000    2005        B      C         25        2000              B              90        2        1       2005             C            130       3       0
# 13      2000    2005        B      A         60        2000              B              90        2        1       2005             A             55       3       2
# 14      2000    2005        C      A         10        2000              C             100        2        0       2005             A             55       3       2

fig, ax = plt.subplots()

rescale_by = 1./600 # trial and error

# draw edges first
for _, edge in edges.iterrows():
    x, y = edge[['x_start', 'y_start']]
    dx, dy = edge[['x_stop', 'y_stop']].values - edge[['x_start', 'y_start']].values
    ax.add_patch(FancyArrow(x, y, dx, dy, width=rescale_by*edge['EdgeWidth'], length_includes_head=True, color='orange'))

# draw nodes second such that they are plotted on top of edges
for _, node in nodes.iterrows():
    ax.add_patch(Circle(node[['x', 'y']], rescale_by*node['NodeSize'], facecolor='w', edgecolor='k'))
    ax.text(node['x'], node['y'], node['NodeSize'], ha='center', va='center')

# annotate nodes
for _, node in nodes[['NodeName', 'y']].drop_duplicates().iterrows():
    ax.text(-0.5, node['y'], node['NodeName'], fontsize=15, fontweight='bold', ha='center', va='center')

for _, node in nodes[['Year', 'x']].drop_duplicates().iterrows():
    ax.text(node['x'], -0.5, node['Year'], fontsize=15, fontweight='bold', ha='center', va='center')

# adjust axis limits to include labels
ax.autoscale_view()
_, xmax = ax.get_xlim()
ax.set_xlim(-1, xmax)

# style axis
ax.set_aspect('equal')
ax.axis('off')

plt.show()

Question on how to draw a customized network in python

Answers (1)

Related Questions