Reputation: 3619
I have a pandas dataframe that captures values over a timespan (maybe monthly over years, or daily over years, or daily over months). There is no guarantee that the time series is continuous (some months might be missing in a year)
""" no guarantee that this index will have an entry for every month of the time range!"""
dates = pd.date_range('1/1/2015', periods=36, freq='M')
df = pd.DataFrame(index = dates)
df['value'] = df.index.year * 0.1 + df.index.month * 0.05
df.plot()
It can give me a simple time series plot
But what I want to make is a 'seasonal' plot. This would display each year's data as a different line on the same index of months. As a simple display:
import numpy as np
index = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
df = pd.DataFrame(index = index)
df[2015] = np.arange(12)*0.4+1
df[2016] = np.arange(12)*0.35+1.4
df[2017] = np.arange(12)*0.5+1.2
df.plot()
I'm looking for a 'pythonic' or elegant way to do this operation. My attempts to transform have been incredibly gross, spaghetti, garbage code. I am sure there must be some tidy approach using pandas/python to display this transformation efficiently and cleanly In particular, I want to find an abstracted way to do this, so that I can generalize it to making charts showing "seasonality" of days across a month, etc.
To start with, I'm not even sure what is a good index to build and base this chart off of.
Upvotes: 1
Views: 2463
Reputation: 863166
You can use DatetimeIndex.strftime
and DatetimeIndex.year
and for correct ordering use sorted CategoricalIndex
, last reshape by pivot
:
c = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.pivot(index=pd.CategoricalIndex(df.index.strftime('%b'), ordered=True, categories=c),
columns=df.index.year,
values=df['value'])
print (df)
2015 2016 2017
Jan 201.55 201.65 201.75
Feb 201.60 201.70 201.80
Mar 201.65 201.75 201.85
Apr 201.70 201.80 201.90
May 201.75 201.85 201.95
Jun 201.80 201.90 202.00
Jul 201.85 201.95 202.05
Aug 201.90 202.00 202.10
Sep 201.95 202.05 202.15
Oct 202.00 202.10 202.20
Nov 202.05 202.15 202.25
Dec 202.10 202.20 202.30
df.plot()
Another solution is create new columns:
df['months'] = pd.CategoricalIndex(df.index.strftime('%b'), ordered=True, categories=c)
df['years'] = df.index.year
df = df.pivot(index='months', columns='years',values='value')
Upvotes: 4