GKC
GKC

Reputation: 479

Stacking bar plot using pandas

I want to represent my data in the form of a bar plot as shown on my expected output. enter image description here

time,date,category
0,2002-05-01,2
1,2002-05-02,0
2,2002-05-03,0
3,2002-05-04,0
4,2002-05-05,0
5,2002-05-06,0
6,2002-05-07,0
7,2002-05-08,2
8,2002-05-09,2
9,2002-05-10,0
10,2002-05-11,2
11,2002-05-12,0
12,2002-05-13,0
13,2002-05-14,2
14,2002-05-15,2
15,2002-05-16,2
16,2002-05-17,2
17,2002-05-18,2
18,2002-05-19,0
19,2002-05-20,0
20,2002-05-21,1
21,2002-05-22,2
22,2002-05-23,0
23,2002-05-24,1
24,2002-05-25,0
25,2002-05-26,0
26,2002-05-27,0
27,2002-05-28,0
28,2002-05-29,1
29,2002-05-30,0

import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')
daily_category = df[['date','category']]
daily_category['weekday'] = pd.to_datetime(daily_category['date']).dt.day_name()
daily_category_plot = daily_category[['weekday','category']]

daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None)
plt.show()

However, I get the below error

Traceback (most recent call last): File "day_plot.py", line 10, in daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None) File "/home/..../.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6525, in groupby dropna=dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 533, in init dropna=self.dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/grouper.py", line 786, in get_grouper raise KeyError(gpr) KeyError: 'weekday'

********** A further example below where I manually extract data below returns almost the expected output except that the days are represented as numbers instead of weekday names. ***********

Day,category1,category2,category3
Sunday,0,0,4
Monday,0,0,4
Tuesday,1,1,2
Wednesday,1,4,0
Thursday,0,2,3
Friday,1,1,2
Saturday,0,2,2

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')

ax = df.plot.bar(stacked=True, color=['green', 'red', 'blue'])
ax.set_xticklabels(labels=df.index, rotation=70, rotation_mode="anchor", ha="right")
ax.set_xlabel('')
ax.set_ylabel('Number of days')
plt.show()

Tested output

enter image description here

Updated code producing odd plot

import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')
daily_category = df[['time','date','category']]
daily_category['weekday'] = pd.to_datetime(daily_category['date']).dt.day_name()

ans = (daily_category.groupby(['weekday', 'category']) 
         .size()
         .reset_index(name='sum')
         .pivot(index='weekday', columns='category', values='sum')
      )

ans.plot.bar(stacked=True)
plt.show()

Updated output

enter image description here

Upvotes: 0

Views: 1442

Answers (2)

mosc9575
mosc9575

Reputation: 6367

This solution uses groupby on to columns and transforms the returned Dataframe using pivot. This can be plotted by plot.bar() but has the wrong labels. Therefor the index is changed.

I did copy and past you code and got a DataFrame by

import pandas as pd
from io import StringIO
t = """time,date,category
0,2002-05-01,2
..."""
df = pd.read_csv(StringIO(t))
df['weekday'] = df.date.apply(lambda x: pd.to_datetime(x).weekday())

To check the expected output for the Wednesday bar I use the filter option.

>>>df[df['weekday']==2]
     time        date  category  weekday
0      0  2002-05-01         2        2
7      7  2002-05-08         2        2
14    14  2002-05-15         2        2
21    21  2002-05-22         2        2
28    28  2002-05-29         1        2

So I want to see on the Wednesday only category 1 (1/5) and category 2 (4/5).

ans = (df.groupby(["weekday", "category"]) 
         .size()
         .reset_index(name="sum")
         .pivot(index='weekday', columns='category', values='sum')
      )
ans.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
ans.plot.bar(stacked=True)

stacked bar plot

Upvotes: 1

Michael Hodel
Michael Hodel

Reputation: 3028

import pandas as pd
import matplotlib.pyplot as plt

d = """0,2002-05-01,2  1,2002-05-02,0  2,2002-05-03,0  3,2002-05-04,0  4,2002-05-05,0  5,2002-05-06,0  6,2002-05-07,0  7,2002-05-08,2  8,2002-05-09,2  9,2002-05-10,0  10,2002-05-11,2  11,2002-05-12,0  12,2002-05-13,0  13,2002-05-14,2  14,2002-05-15,2  15,2002-05-16,2  16,2002-05-17,2  17,2002-05-18,2  18,2002-05-19,0  19,2002-05-20,0  20,2002-05-21,1  21,2002-05-22,2  22,2002-05-23,0  23,2002-05-24,1  24,2002-05-25,0  25,2002-05-26,0  26,2002-05-27,0  27,2002-05-28,0  28,2002-05-29,1  29,2002-05-30,0"""
df = pd.DataFrame([v.split(',') for v in d.split('  ')], columns=['time', 'date', 'category'])
df.time, df.category = df.time.astype(int), df.category.astype(int)

data = df.copy()
data['weekday'] = pd.to_datetime(data['date']).dt.day_name()
data.drop(columns=['time', 'date'], inplace=True)

weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
categories = sorted(list(set(df.category)))
counts = pd.DataFrame(0, index=weekdays, columns=categories)
for weekday, category in zip(data.weekday, data.category):
    counts.loc[weekday, category] += 1

counts.plot.bar(stacked=True);

enter image description here

Upvotes: 1

Related Questions