Pad
Pad

Reputation: 911

Error with x-axis labels when plotting multi-index dataframe using Matplotlib

I've got a timeseries dataframe and I've calculated a season column from the datetime column. I've then indexed the dataframe by 'Season' and 'Year' and want to plot the result. Code below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

dates = pd.date_range('20070101',periods=1000)
df = pd.DataFrame(np.random.randn(1000), columns =list ('A'))
df['date'] = dates

def get_season(row):
    if row['date'].month >= 3 and row['date'].month <= 5:
        return 'spring'
    elif row['date'].month >= 6 and row['date'].month <= 8:
        return 'summer'
    elif row['date'].month >= 9 and row['date'].month <= 11:
        return 'autumn'
    else:
       return 'winter'

df['Season'] = df.apply(get_season, axis=1)
df['Year'] = df['date'].dt.year
df.loc[df['date'].dt.month == 12, 'Year'] += 1
df = df.set_index(['Year', 'Season'], inplace=False)

df.head()

fig,ax = plt.subplots()
df.plot(x_compat=True,ax=ax)

ax.xaxis.set_tick_params(reset=True)
ax.xaxis.set_major_locator(mdates.YearLocator(1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))

plt.show()

Unfortunately this gives me the error when plotting the x axis labels:

File "C:\Users\myname\AppData\Local\Continuum\Anaconda\lib\site-packages\matplotlib\dates.py", line 225, in _from_ordinalf
dt = datetime.datetime.fromordinal(ix)

ValueError: ordinal must be >= 1

I want to see only the year as the x-axis label, not the year and the season.

I'm sure it's something simple that I'm doing wrong but I can't figure out what...

EDIT:

Changing the df.plot function slightly plots the dates a bit better, but still plots months, I'd prefer to have only the year, but this is slightly better than before.

new code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

dates = pd.date_range('20070101',periods=1000)
df = pd.DataFrame(np.random.randn(1000), columns =list ('A'))
df['date'] = dates

def get_season(row):
    if row['date'].month >= 3 and row['date'].month <= 5:
        return 'spring'
    elif row['date'].month >= 6 and row['date'].month <= 8:
        return 'summer'
    elif row['date'].month >= 9 and row['date'].month <= 11:
        return 'autumn'
    else:
        return 'winter'

df['Season'] = df.apply(get_season, axis=1)
df['Year'] = df['date'].dt.year
df.loc[df['date'].dt.month == 12, 'Year'] += 1
df = df.set_index(['Year', 'Season'], inplace=False)

df.head()

fig,ax = plt.subplots()
df.plot(x='date', y = 'A', x_compat=True,ax=ax)

Upvotes: 1

Views: 1446

Answers (1)

CT Zhu
CT Zhu

Reputation: 54330

Unfortunately, the marriage between pandas and matplotlib time locator/formatter is never a happy one. The most consistent way is to have the datetime data in a numpy array of datetime, and have that plotted directly in matplotlib. pandas does provided a nice .to_pydatetime() method:

fig,ax = plt.subplots()
plt.plot(dates.to_pydatetime(), df.A)
years = mdates.YearLocator()   # every year
months = mdates.MonthLocator()  # every month
yearsFmt = mdates.DateFormatter('%Y')
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)

enter image description here

Upvotes: 1

Related Questions