ricsilo
ricsilo

Reputation: 105

Making Categorical or Grouped Bar Graph with secondary Axis Line Graph

I need to compare different sets of daily data between 4 shifts(categorical / groupby), using bar graphs and line graphs. I have looked everywhere and have not found a working solution for this that doesn't include generating new pivots and such.

I've used both, matplotlib and seaborn, and while I can do one or the other(different colored bars/lines for each shift), once I incorporate the other, either one disappears, or other anomalies happen like only one plot point shows. I have looked all over and there are solutions for representing a single series of data on both chart types, but none that goes into multi category or grouped for both.

Data Example:

report_date wh_id   shift   Head_Count  UTL_R
3/17/19     55  A   72  25%
3/18/19     55  A   71  10%
3/19/19     55  A   76  20%
3/20/19     55  A   59  33%
3/21/19     55  A   65  10%
3/22/19     55  A   54  20%
3/23/19     55  A   66  14%
3/17/19     55  1   11  10%
3/17/19     55  2   27  13%
3/17/19     55  3   18  25%
3/18/19     55  1   23  100%
3/18/19     55  2   16  25%
3/18/19     55  3   12  50%
3/19/19     55  1   28  10%
3/19/19     55  2   23  50%
3/19/19     55  3   14  33%
3/20/19     55  1   29  25%
3/20/19     55  2   29  25%
3/20/19     55  3   10  50%
3/21/19     55  1   17  20%
3/21/19     55  2   29  14%
3/21/19     55  3   30  17%
3/22/19     55  1   12  14%
3/22/19     55  2   10  100%
3/22/19     55  3   17  14%
3/23/19     55  1   16  10%
3/23/19     55  2   11  100%
3/23/19     55  3   13  10%
tm_daily_df = pd.read_csv('fg_TM_Daily.csv')
tm_daily_df = tm_daily_df.set_index('report_date')
fig2, ax2 = plt.subplots(figsize=(12,8))
ax3 = ax2.twinx()
group_obj = tm_daily_df.groupby('shift')
g = group_obj['Head_Count'].plot(kind='bar', x='report_date',  y='Head_Count',ax=ax2,stacked=False,alpha = .2)
g = group_obj['UTL_R'].plot(kind='line',x='report_date', y='UTL_R', ax=ax3,marker='d', markersize=12)
plt.legend(tm_daily_df['shift'].unique())

This code has gotten me the closest I've been able to get. Notice that even with stacked = False, they are still stacked. I changed the setting to True, and nothing changes.

All i need is for the bars to be next to each other with the same color scheme representative of the shift

The graph:

graph

Upvotes: 2

Views: 2665

Answers (2)

Teddy
Teddy

Reputation: 645

Here are two solutions (stacked and unstacked). Based on your questions we will:

  • plot Head_Count in the left y axis and UTL_R in the right y axis.
  • report_date will be our x axis
  • shift will represent the hue of our graph.

The stacked version uses pandas default plotting feature, and the unstacked version uses seaborn.

EDIT
From your request, I added a 100% stacked graph. While it is not quite exactly what you asked in the comment, the graph type you asked may create some confusion when reading (are the values based on the upper line of the stack or the width of the stack). An alternative solution may be using a 100% stacked graph.

Stacked

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

fig, ax = plt.subplots(figsize=(12,6))

ax2  = ax.twinx()

dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)

ax.set_title('My Graph')
plt.show()

enter image description here

Stacked 100%

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

# Create `Head_Count_Pct` column
for date in dfg.index.get_level_values('report_date').unique():
    for shift in dfg.loc[date, :].index.get_level_values('shift').unique():
        dfg.loc[(date, shift), 'Head_Count_Pct'] = dfg.loc[(date, shift), 'Head_Count'].sum() / dfg.loc[(date, 'A'), 'Head_Count'].sum()

fig, ax = plt.subplots(figsize=(12,6))

ax2  = ax.twinx()
pal = sns.color_palette("Set1")

dfg[dfg.index.get_level_values('shift').isin(['1','2','3'])]['Head_Count_Pct'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.5, color=pal)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None, color=pal)

ax.set_title('My Graph')
plt.show()

enter image description here

Unstacked

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

fig, ax = plt.subplots(figsize=(15,6))

ax2  = ax.twinx()

sns.barplot(x=dfg.index.get_level_values('report_date'),
            y=dfg.Head_Count,
           hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)

sns.lineplot(x=dfg.index.get_level_values('report_date'),
            y=dfg.UTL_R,
           hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)

ax.set_title('My Graph')
plt.show()

enter image description here


EDIT #2

Here is the graph as you requested in a second time (stacked, but stack n+1 does not start where stack n ends).

It is slightly more involving as we have to do multiple things: - we need to manually assign our color to our shift in our df - once we have our colors assign, we will iterate through each date range and 1) sort or Head_Count values descending (so that our largest sack is in the back when we plot the graph), and 2) plot the data and assign the color to each stacj - Then we can create our second y axis and plot our UTL_R values - Then we need to assign the correct color to our legend labels

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def assignColor(shift):
    if shift == 'A':
        return 'R'
    if shift == '1':
        return 'B'
    if shift == '2':
        return 'G'
    if shift == '3':
        return 'Y'

# map a color to a shift
df['color'] = df['shift'].apply(assignColor)

fig, ax = plt.subplots(figsize=(15,6))

# plot our Head_Count values
for date in df.report_date.unique():
    d = df[df.report_date == date].sort_values(by='Head_Count', ascending=False)
    y = d.Head_Count.values
    x = date
    color = d.color
    b = plt.bar(x,y, color=color)

# Plot our UTL_R values
ax2 = ax.twinx()    

sns.lineplot(x=df.report_date, y=df.UTL_R, hue=df['shift'], marker='o', legend=None)

# Assign the color label color to our legend
leg = ax.legend(labels=df['shift'].unique(), loc=1)

legend_maping = dict()

for shift in df['shift'].unique():
    legend_maping[shift] = df[df['shift'] == shift].color.unique()[0]

i = 0
for leg_lab in leg.texts:
    leg.legendHandles[i].set_color(legend_maping[leg_lab.get_text()])
    i += 1

enter image description here

Upvotes: 1

josemz
josemz

Reputation: 1312

How about this?

tm_daily_df['UTL_R'] = tm_daily_df['UTL_R'].str.replace('%', '').astype('float') / 100
pivoted = tm_daily_df.pivot_table(values=['Head_Count', 'UTL_R'], 
                                  index='report_date', 
                                  columns='shift')
pivoted

#             Head_Count             UTL_R
# shift                1   2   3   A     1     2     3     A
# report_date
# 3/17/19             11  27  18  72  0.10  0.13  0.25  0.25
# 3/18/19             23  16  12  71  1.00  0.25  0.50  0.10
# 3/19/19             28  23  14  76  0.10  0.50  0.33  0.20
# 3/20/19             29  29  10  59  0.25  0.25  0.50  0.33
# 3/21/19             17  29  30  65  0.20  0.14  0.17  0.10
# 3/22/19             12  10  17  54  0.14  1.00  0.14  0.20
# 3/23/19             16  11  13  66  0.10  1.00  0.10  0.14

fig, ax = plt.subplots()
pivoted['Head_Count'].plot.bar(ax=ax)
pivoted['UTL_R'].plot.line(ax=ax, legend=False, secondary_y=True, marker='D')
ax.legend(loc='upper left', title='shift')

Upvotes: 1

Related Questions