Reputation: 369
Why Matplotlib has this strange behaviour with Date Data Type?
Matplotlib allows you to natively plots python datetime instances, and for the most part does a good job picking tick locations and string formats. From the documentation "Fixing common date annoyances"
I also read this question that gave me some clues related to Matplotlib Date Format.
I also read the most voted questions about matplotlib
and Datetime
but I still don't understand the following behaviour.
#timestamp is a <class 'list'>
timestamp=['2019-02-04', '2019-01-15', '2018-10-08', '2018-07-09',
'2018-04-09', '2018-02-08', '2017-09-08', '2017-09-08',
'2017-07-07', '2017-04-07', '2017-01-09', '2016-10-07',
'2016-07-01', '2016-03-25', '2015-12-27', '2015-09-25',
'2015-06-26', '2015-03-27', '2014-12-24', '2014-10-06',
'2014-07-02', '2014-03-28', '2013-12-20', '2013-09-27',
'2013-06-11', '2013-03-27', '2012-12-27', '2012-09-26',
'2012-06-13', '2012-03-28', '2011-12-14', '2011-09-28',
'2011-06-14', '2011-03-30', '2010-12-15', '2010-09-29',
'2010-06-19', '2010-03-31', '2009-12-29', '2009-09-30',
'2009-06-17', '2009-04-01', '2008-12-20', '2008-08-25',
'2008-08-25', '2008-06-19', '2008-03-19', '2008-03-19',
'2006-04-11', '2005-12-27', '2005-09-28', '2005-07-02',
'2005-04-20', '2004-12-21', '2004-10-20', '2004-07-21',
'2003-09-22', '2003-08-20', '2002-12-31']
#time_python is a <class 'datetime.datetime'>
time_python=[datetime.strptime(d, "%Y-%m-%d") for d in timestamp]
#time_series is a <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
time_series=pd.to_datetime(timestamp)
array=np.arange(1,len(timestamp)+1)
time_2_num=mdates.date2num(time_series.to_pydatetime())
#First plot using the List Format as x axes
plt.subplot(411)
plt.bar(timestamp,array)
plt.xticks(rotation='vertical')
#Second plot using the padas Datatime Format as x axes
plt.subplot(412)
plt.bar(time_series,array)
plt.xticks(rotation='vertical')
plt.subplots_adjust(hspace = 1.2)
#Third plot using the DateTime Format as x axes
plt.subplot(413)
plt.bar(time_python,array)
plt.xticks(rotation='vertical')
plt.subplots_adjust(hspace = 1.2)
#Fourth plot using the Matplot Date Format as x axes
plt.subplot(414)
plt.bar(time_2_num,array)
plt.xticks(rotation='vertical')
plt.subplots_adjust(hspace = 1.2)
plt.gcf().autofmt_xdate()
plt.show()
The desired result is obviously the first plot.
I want to understand better why the bars of the II,III,IV plot has this representation, different from the I. The y input is the same for the 4 plots.
Upvotes: 0
Views: 241
Reputation: 339052
First, the difference become more obvious if you remove the line plt.gcf().autofmt_xdate()
because that removes the labels from all but the last plot.
First Plot
The first plot is "categorical" plot. The values for the x axis are strings. They are shown one by one in the order they appear in the input list/array and each gets its own label. In this case matplotlib does not know that the strings represent dates and indeed you could also supply a list of fruits instead (["Apple", "Banana", "Cherry", ...]
)
Second / Third plot
Those is the intended behaviour for datetime plots in matplotlib. Matplotlib works with datetime
or numpy.datetime64
objects equally well. The axis is a true scale in the sense of a line with a defined linear metric (i.e. the distance between monday and wednesday is twice as large as between saturday and sunday). Concerning the units of such datetime axes the documentation states
Matplotlib represents dates using floating point numbers specifying the number of days since 0001-01-01 UTC, plus 1.
Because matplotlib recognizes the datetime input, it will automatically choose a date locator and formatter as to have the ticks at useful locations
Fourth plot
The fourth plot is in principle identical to the two above. The only difference that matplotlib has no chance of knowing that the numbers (like 731000) are meant to denote dates (they could be distance between earth and satellites as well).
You can still get the same appearance as in the two plots above by manually setting a locator and formatter, e.g. adding the following lines to the last plot
loc = mdates.AutoDateLocator()
plt.gca().xaxis.set_major_locator(loc)
plt.gca().xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
results in the same plot as the second, third plot
Upvotes: 1
Reputation: 1672
For the first graph, you are sending timestamp
which is purely strings. Matplotlib treats it as strings. If you look at the plot on its own, you will see the labels are just the strings from timestamp
in the same order.
The other 3 methods convert this string into timestamps, which matplotlib treats differently.
timestamp=['2019-02-04', '2019-01-15', '2018-10-08', '2018-07-09',
'2018-04-09', '2018-02-08', '2017-09-08', '2017-09-08',
'2017-07-07', '2017-04-07', '2017-01-09', '2016-10-07',
'2016-07-01', '2016-03-25', '2015-12-27', '2015-09-25',
'2015-06-26', '2015-03-27', '2014-12-24', '2014-10-06',
'2014-07-02', '2014-03-28', '2013-12-20', '2013-09-27',
'2013-06-11', '2013-03-27', '2012-12-27', '2012-09-26',
'2012-06-13', '2012-03-28', '2011-12-14', '2011-09-28',
'2011-06-14', '2011-03-30', '2010-12-15', '2010-09-29',
'2010-06-19', '2010-03-31', '2009-12-29', '2009-09-30',
'2009-06-17', '2009-04-01', '2008-12-20', '2008-08-25',
'2008-08-25', '2008-06-19', '2008-03-19', '2008-03-19',
'2006-04-11', '2005-12-27', '2005-09-28', '2005-07-02',
'2005-04-20', '2004-12-21', '2004-10-20', '2004-07-21',
'2003-09-22', '2003-08-20', '2002-12-31']
array=np.arange(1,len(timestamp)+1)
plt.bar(timestamp,array)
plt.xticks(rotation='vertical')
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
Upvotes: 0