pandas.groupby --> DatetimeIndex --> groupby year

I come from Javascript and struggle. Need to sort data by DatetimeIndex, further by the year. CSV looks like this (i shortened it because of more than 1300 entries):

date,value
2016-05-09,1201
2017-05-10,2329
2018-05-11,1716
2019-05-12,10539

I wrote my code like this to throw away the first and last 2.5 percent of the dataframe:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

df = pd.read_csv( "fcc-forum-pageviews.csv", index_col="date", parse_dates=True).sort_values('value')

df = df.iloc[(int(round((df.count() / 100 * 2,5)[0]))):(int(round(((df.count() / 100 * 97,5)[0])-1)))]

df = df.sort_index()

Now I need to group my DatetimeIndex by years to plot it in a manner way by matplotlib. I struggle right here:

def draw_bar_plot():
    df_bar = df

    fig, ax = plt.subplots()
    fig.figure.savefig('bar_plot.png')
    return fig

I really dont know how to groupby years. Doing something like:

print(df_bar.groupby(df_bar.index).first())

leads to:

             value
date              
2016-05-19   19736
2016-05-20   17491
2016-05-26   18060
2016-05-27   19997
2016-05-28   19044
...            ...
2019-11-23  146658
2019-11-24  138875
2019-11-30  141161
2019-12-01  142918
2019-12-03  158549

How to group this by year? Maybe further explain how to get the data ploted by mathplotlib as a bar chart accurately.

Upvotes: 0

Views: 370

Answers (1)

imdevskp
imdevskp

Reputation: 2223

This will group the data by year

df_year_wise_sum = df.groupby([df.index.year]).sum()

This line of code will give a bar plot

df_year_wise_sum.plot(kind='bar')
plt.savefig('bar_plot.png')
plt.show()

Upvotes: 1

Related Questions