Reputation: 71
I have a dataset of 12 values, with the index being a datetime64
type and I want to plot a bar graph of the data with the x-ticks
showing the Month in English. I have used the MonthLocator
and DateFormatter
functions of matplotlib. These are working for one dataset but not with the other one. The x-ticks months are labelled wrongly. January should be the first index.
Dataset --> full_corr
corr
timestamp
2010-01-31 0.367613
2010-02-28 0.178960
2010-03-31 0.217788
2010-04-30 0.146214
2010-05-31 0.201297
2010-06-30 0.609486
2010-07-31 0.659257
2010-08-31 0.397254
2010-09-30 0.729701
2010-10-31 0.916465
2010-11-30 0.533646
2010-12-31 0.893937
Code used -->
plt.bar(full_corr.index, full_corr['corr'], width=10) # some bugs are there
ax = plt.gca()
locator = mdates.MonthLocator()
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)
Output is --> Output Plot
But when I plot the dataframe directly by using df.plot(kind="bar")
, the x-ticks are showed properly in the full datetime format.
Upvotes: 1
Views: 646
Reputation: 3630
The issue is that the matplotlib.dates
MonthLocator
places the ticks on the first day of each month by default, whereas the DatetimeIndex of your dataset has a so-called 'month end' frequency. Here are two simple solutions to this problem.
Solution 1: add bymonthday=-1
to MonthLocator
plt.bar(full_corr.index, full_corr['corr'], width=10)
ax = plt.gca()
locator = mdates.MonthLocator(bymonthday=-1)
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)
Solution 2: resample the dataframe to a 'month start' frequency
full_corr_ms = full_corr.resample('MS').sum()
plt.bar(full_corr_ms.index, full_corr_ms['corr'], width=10)
ax = plt.gca()
locator = mdates.MonthLocator()
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)
Upvotes: 1
Reputation: 29982
The problem is that 2010-01-31 is too near to 2010-02-01. So when you set width to 10, it overlays Feb.
https://i.sstatic.net/EDewS.png
A soution to solve this is to convert 2010-01-31 to 2010-01.
pd.to_datetime
is used to convert series to datetime
.pd.Series.dt.strftime
converts datetime
series to a string in our desired format.import pandas as pd
import datetime as datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
full_corr = pd.read_csv("1.csv")
# Below two lines are same with full_corr['timestamp'] = pd.to_datetime(full_corr['timestamp']).dt.strftime('%Y-%m')
full_corr['timestamp'] = pd.to_datetime(full_corr['timestamp'])
full_corr['timestamp'] = full_corr['timestamp'].apply(lambda x: datetime.datetime.strftime(x, '%Y-%m'))
full_corr['timestamp'] = pd.to_datetime(full_corr['timestamp'])
plt.bar(full_corr['timestamp'], full_corr['corr'], width=10) # some bugs are there
ax = plt.gca()
locator = mdates.MonthLocator()
month_fmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(month_fmt)
plt.show()
Upvotes: 1