Reputation: 17144
I was trying to plot a time series after grouping by month but I am still getting years on x-axis labels instead of months. How can we get months on x-axis labels and different curves for different years?
Here is my attempt:
import numpy as np
import pandas as pd
import statsmodels.api as sm
df = pd.DataFrame.from_records(sm.datasets.co2.load().data)
df['index'] = pd.to_datetime(df['index'])
df = df.set_index('index')
ts = df['co2']['1960':]
ts = ts.bfill()
ts = ts.resample('MS').sum()
ts.groupby(ts.index.month).plot()
months names on x-axis of plots and different curves for different years.
The plot should look like something similar to:
Upvotes: 2
Views: 11001
Reputation: 17144
You can start with this:
ts.groupby([ts.index.month,ts.index.year]).sum().unstack().plot(figsize=(12,8))
import numpy as np
import pandas as pd
import calendar
import seaborn as sns
sns.set(color_codes=True)
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv',
parse_dates=['date'],index_col=['date'])
ts = df['value']
df_plot = ts.groupby([ts.index.month,ts.index.year]).sum().unstack()
df_plot
fig, ax = plt.subplots(figsize=(12,8))
df_plot.plot(ax=ax,legend=False)
# xticks
months = [calendar.month_abbr[i] for i in range(1,13)]
ax.set_xticks(range(12))
ax.set_xticklabels(months)
# plot names in the end
for col in df_plot.columns:
plt.annotate(col,xy=(plt.xticks()[0][-1]+0.7, df_plot[col].iloc[-1]))
Upvotes: 1
Reputation: 409
I think you're looking for pandas.to_datetime() and then use the .month or .year propery of the dattime index.
Also by using statsmodel's 'as_pandas=True' your code becomes a bit shorter
Anyways if you want to plot the month as hue I recommend using seaborn over matplotlib
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
df = sm.datasets.co2.load(as_pandas=True).data
df['month'] = pd.to_datetime(df.index).month
df['year'] = pd.to_datetime(df.index).year
sns.lineplot(x='month',y='co2',hue='year',data=df.query('year>1995')) # filtered over 1995 to make the plot less cluttered
this gives
Upvotes: 3