baloo
baloo

Reputation: 527

Sankey bar chart diagramm with pandas or python

I would like to make a barchart diagramm like this one with any python module that I can interface with matplotlib:

Sankey stacked bar chart

Below is an example data and an explanation of what I can do as for now:

import pandas
from io import StringIO

text="""
Name                           1980              1982
A                    Administration            Budget
B                    Administration    Administration
C                    Administration    Administration
D                    Administration            Budget
E                    Administration            Budget
F                    Administration    Administration
G                    Administration    Administration
H                    Administration    Administration
"""

data=pandas.read_fwf(StringIO(text),header=1).set_index("Name")

count=pandas.DataFrame(index=["Administration","Budget"])
for col in data.columns:
    count[col]=data[col].value_counts()

count.T.plot(kind="bar",stacked=True)

When I plot count, I get the following stacked bar chart:

Stacked bar chart

I can also get the number of people who moved between 1980 and 1982 from the Administration department to the Budget department by doing

pandas.crosstab(data["1980"],data["1982"])

which gives:

1982            Administration  Budget
1980                                  
Administration               5       3

However I don't know how to draw the flows between each part of the bar chart. Does anyone know how ?

Upvotes: 4

Views: 2959

Answers (1)

S.Honcharov
S.Honcharov

Reputation: 43

You can use functions of pandas: crosstab and melt for prepare your data for sankey:

from io import StringIO
import pandas as pd
import plotly
import chart_studio.plotly as py
    
text = """
Name                           1980              1982
A                    Administration            Budget
B                    Administration    Administration
C                    Administration    Administration
D                    Administration            Budget
E                    Administration            Budget
F                    Administration    Administration
G                    Administration    Administration
H                    Administration    Administration
"""
data = pd.read_fwf(StringIO(text),header=1)
    
# Make crosstab
data_cross = pd.crosstab(data['1980'], data['1982'])
print(data_cross)

# Make flat table
data_tidy = data_cross.rename_axis(None, axis=1).reset_index().copy()

# Make tidy table
formatted_data = pd.melt(data_tidy,
                             ['1980'],
                             var_name='1982',
                             value_name='Value')
    
import plotly.graph_objects as go
    
fig = go.Figure(data=[go.Sankey(
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = ["Administration", "Administration", "Budget"],
          color = ['blue', 'blue', 'green']
        ),
        link = dict(
            source = [0, 0], # indices correspond to labels...
            target = [1, 2],
            value = [5, 3],
            color = ['lightblue', 'lightgreen']
      ))])
    
fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
    fig.show()

Produces the following output:

Snapshot of figure

Upvotes: 2

Related Questions