random_user
random_user

Reputation: 71

Why month x-ticks are labelled wrongly in a datetime index?

I have a dataset of 12 values, with the index being a datetime64 type and I want to plot a bar graph of the data with the x-ticks showing the Month in English. I have used the MonthLocator and DateFormatter functions of matplotlib. These are working for one dataset but not with the other one. The x-ticks months are labelled wrongly. January should be the first index.

Dataset --> full_corr

              corr
timestamp   
2010-01-31  0.367613
2010-02-28  0.178960
2010-03-31  0.217788
2010-04-30  0.146214
2010-05-31  0.201297
2010-06-30  0.609486
2010-07-31  0.659257
2010-08-31  0.397254
2010-09-30  0.729701
2010-10-31  0.916465
2010-11-30  0.533646
2010-12-31  0.893937

Code used -->

plt.bar(full_corr.index, full_corr['corr'], width=10) # some bugs are there
ax = plt.gca()
locator = mdates.MonthLocator()
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)

Output is --> Output Plot

But when I plot the dataframe directly by using df.plot(kind="bar"), the x-ticks are showed properly in the full datetime format.

Upvotes: 1

Views: 646

Answers (2)

Patrick FitzGerald
Patrick FitzGerald

Reputation: 3630

The issue is that the matplotlib.dates MonthLocator places the ticks on the first day of each month by default, whereas the DatetimeIndex of your dataset has a so-called 'month end' frequency. Here are two simple solutions to this problem.

Solution 1: add bymonthday=-1 to MonthLocator

plt.bar(full_corr.index, full_corr['corr'], width=10)
ax = plt.gca()
locator = mdates.MonthLocator(bymonthday=-1)
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)

solution1


Solution 2: resample the dataframe to a 'month start' frequency

full_corr_ms = full_corr.resample('MS').sum()
plt.bar(full_corr_ms.index, full_corr_ms['corr'], width=10)
ax = plt.gca()
locator = mdates.MonthLocator()
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)

solution2

Upvotes: 1

Ynjxsjmh
Ynjxsjmh

Reputation: 29982

The problem is that 2010-01-31 is too near to 2010-02-01. So when you set width to 10, it overlays Feb.

https://i.sstatic.net/EDewS.png

A soution to solve this is to convert 2010-01-31 to 2010-01.

import pandas as pd
import datetime as datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

full_corr = pd.read_csv("1.csv")

# Below two lines are same with full_corr['timestamp'] = pd.to_datetime(full_corr['timestamp']).dt.strftime('%Y-%m')
full_corr['timestamp'] = pd.to_datetime(full_corr['timestamp'])
full_corr['timestamp'] = full_corr['timestamp'].apply(lambda x: datetime.datetime.strftime(x, '%Y-%m'))

full_corr['timestamp'] = pd.to_datetime(full_corr['timestamp'])

plt.bar(full_corr['timestamp'], full_corr['corr'], width=10) # some bugs are there
ax = plt.gca()

locator = mdates.MonthLocator()
month_fmt = mdates.DateFormatter('%b')

ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)

plt.show()

enter image description here

Upvotes: 1

Related Questions