Reputation: 89

Python multiple lines by year

I have a dataset consisting of two columns: Date ds and volume y. I would like to see how the daily avg volume is trending across different months and years. I would like to have month names on x-axis and avg vol on y-axis. The lines should represent different years. Here is sample dataset and where I am stuck.

df = pd.DataFrame([
    {"ds":"2017-01-01","y":3},
    {"ds":"2017-01-18","y":4},
    {"ds":"2017-02-04","y":6},
    {"ds":"2018-01-06","y":2},
    {"ds":"2018-01-12","y":8},
    {"ds":"2018-02-08","y":2},
    {"ds":"2018-03-02","y":8},
    {"ds":"2018-03-15","y":2},
    {"ds":"2018-03-22","y":8},
    ])
df["ds"] = pd.to_datetime(df["ds"])
df.set_index("ds",inplace=True)
df.resample("M").mean().plot()

Desired output

Upvotes: 2

Answers (2)

DYZ

Reputation: 57033

You must group by years and by months:

import calendar # to use months' proper names
means = df.groupby([df.index.month, df.index.year]).mean()\
          .unstack().reset_index(0, drop=True)\
          .rename(dict(enumerate(calendar.month_abbr[1:])))
#ds   2017  2018
#ds             
#Jan   3.5   5.0
#Feb   6.0   2.0
#Mar   NaN   6.0

Upvotes: 2

jezrael

Reputation: 862711

Solution with aggregate mean for month names with years, reshape by Series.unstack and last ploting:

df["ds"] = pd.to_datetime(df["ds"])
#if necessary sorting
#df = df.sort_values('ds')
df1 = (df.groupby([df["ds"].dt.strftime('%b'), df["ds"].dt.year], sort=False)['y']       
         .mean()
         .unstack(fill_value=0))
print (df1)
ds   2017  2018
ds             
Jan   3.5   5.0
Feb   6.0   2.0
Mar   0.0   6.0

df1.plot()

Upvotes: 2

Python multiple lines by year

Answers (2)

Related Questions