JA-pythonista
JA-pythonista

Reputation: 1323

Plot group bar charts with matplotlib or Seaborn with Datetime Index in Python

I have a Pandas DataFrame that consists of a date column and a category column of interest. I would like to see the Frequency count for each month. When I did this with matplotlib, I get something that looks quite bad.

Here is what the frame looks like when grouped by the months:

df.resample("M")["category_col"].value_counts(normalize=True).mul(100)

Output

date                         category_col      
2019-12-31  A                41.929004
            B                25.758765
            C                17.752111
            D                9.189919
            E                3.625122
            F                1.745080
2020-01-31  A                54.052744
            C                16.347271
            B                14.414431
            D                11.677537
            E                2.675607
            F                0.832411
2020-02-29  A                48.928468
            D                22.011116
            C                14.084507
            C                11.729162
            E                2.193272
            F                1.053475
2020-03-31  A                54.435410
            D                15.718065
            C                14.577060
            B                11.335682
            E                2.884205
            F                1.049578
Name: category_col, dtype: float64

Here what my attempt

df.date = pd.to_datetime(df.date)
df.set_index("date", inplace=True)
df.resample("M")["category_col"].value_counts(normalize=True).mul(100).plot(kind="bar")

See the output below:

enter image description here

Here is what I want:

enter image description here

Upvotes: 1

Views: 873

Answers (2)

Matias Eiletz
Matias Eiletz

Reputation: 429

First of all, to get the name of the months, reset the index and select the right columns:

df['month'] = df['date'].apply(lambda x: pd.Timestamp(x).strftime('%B'))

df = df.reset_index()

df = df[['month','category_col','value]]

Then, assuming that you have a dataframe (called df) like this:

month       category_col     value      
September   A                41.929004
September   B                25.758765

Perform the following to get the plot you are looking for, using Seaborn:

import seaborn as sns 
ax = sns.barplot(x="month", y="value", hue="category_col", data=df)

Upvotes: 1

jezrael
jezrael

Reputation: 862441

I think you need Series.unstack with rename for rormat of datetimes month name year:

df.date = pd.to_datetime(df.date)
df = df.set_index("date")

s = df.resample("M")["category_col"].value_counts(normalize=True).mul(100)

s.unstack().rename(lambda x: x.strftime('%B %Y')).plot(kind="bar")

Sample:

print (s)
date        category_col
2019-12-31  A               41.929004
            B               25.758765
            C               17.752111
            D                9.189919
            E                3.625122
            F                1.745080
2020-01-31  A               54.052744
            C               16.347271
            B               14.414431
            D               11.677537
            E                2.675607
            F                0.832411
2020-02-29  A               48.928468
            B               22.011116
            C               14.084507
            D               11.729162
            E                2.193272
            F                1.053475
2020-03-31  A               54.435410
            D               15.718065
            C               14.577060
            B               11.335682
            E                2.884205
            F                1.049578
Name: A, dtype: float64

print (s.unstack())
category_col          A          B          C          D         E         F
date                                                                        
2019-12-31    41.929004  25.758765  17.752111   9.189919  3.625122  1.745080
2020-01-31    54.052744  14.414431  16.347271  11.677537  2.675607  0.832411
2020-02-29    48.928468  22.011116  14.084507  11.729162  2.193272  1.053475
2020-03-31    54.435410  11.335682  14.577060  15.718065  2.884205  1.049578

print (s.unstack().rename(lambda x: x.strftime('%B %Y')))
category_col           A          B          C          D         E         F
date                                                                         
December 2019  41.929004  25.758765  17.752111   9.189919  3.625122  1.745080
January 2020   54.052744  14.414431  16.347271  11.677537  2.675607  0.832411
February 2020  48.928468  22.011116  14.084507  11.729162  2.193272  1.053475
March 2020     54.435410  11.335682  14.577060  15.718065  2.884205  1.049578

Upvotes: 1

Related Questions