Reputation: 89
I have a dataset consisting of two columns: Date ds
and volume y
. I would like to see how the daily avg volume is trending across different months and years. I would like to have month names on x-axis and avg vol on y-axis. The lines should represent different years. Here is sample dataset and where I am stuck.
df = pd.DataFrame([
{"ds":"2017-01-01","y":3},
{"ds":"2017-01-18","y":4},
{"ds":"2017-02-04","y":6},
{"ds":"2018-01-06","y":2},
{"ds":"2018-01-12","y":8},
{"ds":"2018-02-08","y":2},
{"ds":"2018-03-02","y":8},
{"ds":"2018-03-15","y":2},
{"ds":"2018-03-22","y":8},
])
df["ds"] = pd.to_datetime(df["ds"])
df.set_index("ds",inplace=True)
df.resample("M").mean().plot()
Upvotes: 2
Views: 65
Reputation: 57033
You must group by years and by months:
import calendar # to use months' proper names
means = df.groupby([df.index.month, df.index.year]).mean()\
.unstack().reset_index(0, drop=True)\
.rename(dict(enumerate(calendar.month_abbr[1:])))
#ds 2017 2018
#ds
#Jan 3.5 5.0
#Feb 6.0 2.0
#Mar NaN 6.0
Upvotes: 2
Reputation: 862711
Solution with aggregate mean
for month names with years, reshape by Series.unstack
and last ploting:
df["ds"] = pd.to_datetime(df["ds"])
#if necessary sorting
#df = df.sort_values('ds')
df1 = (df.groupby([df["ds"].dt.strftime('%b'), df["ds"].dt.year], sort=False)['y']
.mean()
.unstack(fill_value=0))
print (df1)
ds 2017 2018
ds
Jan 3.5 5.0
Feb 6.0 2.0
Mar 0.0 6.0
df1.plot()
Upvotes: 2