Reputation: 1323
I have a Pandas DataFrame that consists of a date column and a category column of interest. I would like to see the Frequency count for each month. When I did this with matplotlib, I get something that looks quite bad.
Here is what the frame looks like when grouped by the months:
df.resample("M")["category_col"].value_counts(normalize=True).mul(100)
Output
date category_col
2019-12-31 A 41.929004
B 25.758765
C 17.752111
D 9.189919
E 3.625122
F 1.745080
2020-01-31 A 54.052744
C 16.347271
B 14.414431
D 11.677537
E 2.675607
F 0.832411
2020-02-29 A 48.928468
D 22.011116
C 14.084507
C 11.729162
E 2.193272
F 1.053475
2020-03-31 A 54.435410
D 15.718065
C 14.577060
B 11.335682
E 2.884205
F 1.049578
Name: category_col, dtype: float64
Here what my attempt
df.date = pd.to_datetime(df.date)
df.set_index("date", inplace=True)
df.resample("M")["category_col"].value_counts(normalize=True).mul(100).plot(kind="bar")
See the output below:
Here is what I want:
Upvotes: 1
Views: 873
Reputation: 429
First of all, to get the name of the months, reset the index and select the right columns:
df['month'] = df['date'].apply(lambda x: pd.Timestamp(x).strftime('%B'))
df = df.reset_index()
df = df[['month','category_col','value]]
Then, assuming that you have a dataframe (called df) like this:
month category_col value
September A 41.929004
September B 25.758765
Perform the following to get the plot you are looking for, using Seaborn:
import seaborn as sns
ax = sns.barplot(x="month", y="value", hue="category_col", data=df)
Upvotes: 1
Reputation: 862441
I think you need Series.unstack
with rename
for rormat of datetimes month name year
:
df.date = pd.to_datetime(df.date)
df = df.set_index("date")
s = df.resample("M")["category_col"].value_counts(normalize=True).mul(100)
s.unstack().rename(lambda x: x.strftime('%B %Y')).plot(kind="bar")
Sample:
print (s)
date category_col
2019-12-31 A 41.929004
B 25.758765
C 17.752111
D 9.189919
E 3.625122
F 1.745080
2020-01-31 A 54.052744
C 16.347271
B 14.414431
D 11.677537
E 2.675607
F 0.832411
2020-02-29 A 48.928468
B 22.011116
C 14.084507
D 11.729162
E 2.193272
F 1.053475
2020-03-31 A 54.435410
D 15.718065
C 14.577060
B 11.335682
E 2.884205
F 1.049578
Name: A, dtype: float64
print (s.unstack())
category_col A B C D E F
date
2019-12-31 41.929004 25.758765 17.752111 9.189919 3.625122 1.745080
2020-01-31 54.052744 14.414431 16.347271 11.677537 2.675607 0.832411
2020-02-29 48.928468 22.011116 14.084507 11.729162 2.193272 1.053475
2020-03-31 54.435410 11.335682 14.577060 15.718065 2.884205 1.049578
print (s.unstack().rename(lambda x: x.strftime('%B %Y')))
category_col A B C D E F
date
December 2019 41.929004 25.758765 17.752111 9.189919 3.625122 1.745080
January 2020 54.052744 14.414431 16.347271 11.677537 2.675607 0.832411
February 2020 48.928468 22.011116 14.084507 11.729162 2.193272 1.053475
March 2020 54.435410 11.335682 14.577060 15.718065 2.884205 1.049578
Upvotes: 1