Reputation: 3
I come from Javascript and struggle. Need to sort data by DatetimeIndex, further by the year. CSV looks like this (i shortened it because of more than 1300 entries):
date,value
2016-05-09,1201
2017-05-10,2329
2018-05-11,1716
2019-05-12,10539
I wrote my code like this to throw away the first and last 2.5 percent of the dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv( "fcc-forum-pageviews.csv", index_col="date", parse_dates=True).sort_values('value')
df = df.iloc[(int(round((df.count() / 100 * 2,5)[0]))):(int(round(((df.count() / 100 * 97,5)[0])-1)))]
df = df.sort_index()
Now I need to group my DatetimeIndex by years to plot it in a manner way by matplotlib. I struggle right here:
def draw_bar_plot():
df_bar = df
fig, ax = plt.subplots()
fig.figure.savefig('bar_plot.png')
return fig
I really dont know how to groupby years. Doing something like:
print(df_bar.groupby(df_bar.index).first())
leads to:
value
date
2016-05-19 19736
2016-05-20 17491
2016-05-26 18060
2016-05-27 19997
2016-05-28 19044
... ...
2019-11-23 146658
2019-11-24 138875
2019-11-30 141161
2019-12-01 142918
2019-12-03 158549
How to group this by year? Maybe further explain how to get the data ploted by mathplotlib as a bar chart accurately.
Upvotes: 0
Views: 370
Reputation: 2223
This will group the data by year
df_year_wise_sum = df.groupby([df.index.year]).sum()
This line of code will give a bar plot
df_year_wise_sum.plot(kind='bar')
plt.savefig('bar_plot.png')
plt.show()
Upvotes: 1