Andrea Ciufo
Andrea Ciufo

Reputation: 369

Plot with Matplotlib using List - Datetime - Different Behaviour on Format

Why Matplotlib has this strange behaviour with Date Data Type?

Matplotlib allows you to natively plots python datetime instances, and for the most part does a good job picking tick locations and string formats. From the documentation "Fixing common date annoyances"

I also read this question that gave me some clues related to Matplotlib Date Format. I also read the most voted questions about matplotlib and Datetime but I still don't understand the following behaviour.

#timestamp is a <class 'list'>
timestamp=['2019-02-04', '2019-01-15', '2018-10-08', '2018-07-09',
           '2018-04-09', '2018-02-08', '2017-09-08', '2017-09-08',
           '2017-07-07', '2017-04-07', '2017-01-09', '2016-10-07',
           '2016-07-01', '2016-03-25', '2015-12-27', '2015-09-25',
           '2015-06-26', '2015-03-27', '2014-12-24', '2014-10-06',
           '2014-07-02', '2014-03-28', '2013-12-20', '2013-09-27',
           '2013-06-11', '2013-03-27', '2012-12-27', '2012-09-26',
           '2012-06-13', '2012-03-28', '2011-12-14', '2011-09-28',
           '2011-06-14', '2011-03-30', '2010-12-15', '2010-09-29',
           '2010-06-19', '2010-03-31', '2009-12-29', '2009-09-30',
           '2009-06-17', '2009-04-01', '2008-12-20', '2008-08-25',
           '2008-08-25', '2008-06-19', '2008-03-19', '2008-03-19',
           '2006-04-11', '2005-12-27', '2005-09-28', '2005-07-02',
           '2005-04-20', '2004-12-21', '2004-10-20', '2004-07-21',
           '2003-09-22', '2003-08-20', '2002-12-31']

#time_python is a <class 'datetime.datetime'>
time_python=[datetime.strptime(d, "%Y-%m-%d") for d in timestamp]
#time_series is a <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
time_series=pd.to_datetime(timestamp)

array=np.arange(1,len(timestamp)+1) 

time_2_num=mdates.date2num(time_series.to_pydatetime())

#First plot using the List Format as x axes
plt.subplot(411)
plt.bar(timestamp,array)
plt.xticks(rotation='vertical')

#Second plot using the padas Datatime Format as x axes
plt.subplot(412)
plt.bar(time_series,array)
plt.xticks(rotation='vertical')
plt.subplots_adjust(hspace = 1.2)

#Third plot using the DateTime Format as x axes 
plt.subplot(413)
plt.bar(time_python,array)
plt.xticks(rotation='vertical')
plt.subplots_adjust(hspace = 1.2)

#Fourth plot using the Matplot Date Format as x axes 
plt.subplot(414)
plt.bar(time_2_num,array)
plt.xticks(rotation='vertical')
plt.subplots_adjust(hspace = 1.2)

plt.gcf().autofmt_xdate()  

plt.show()

The desired result is obviously the first plot.

enter image description here

I want to understand better why the bars of the II,III,IV plot has this representation, different from the I. The y input is the same for the 4 plots.

Upvotes: 0

Views: 241

Answers (2)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339052

First, the difference become more obvious if you remove the line plt.gcf().autofmt_xdate() because that removes the labels from all but the last plot.

enter image description here

First Plot

The first plot is "categorical" plot. The values for the x axis are strings. They are shown one by one in the order they appear in the input list/array and each gets its own label. In this case matplotlib does not know that the strings represent dates and indeed you could also supply a list of fruits instead (["Apple", "Banana", "Cherry", ...])

Second / Third plot

Those is the intended behaviour for datetime plots in matplotlib. Matplotlib works with datetime or numpy.datetime64 objects equally well. The axis is a true scale in the sense of a line with a defined linear metric (i.e. the distance between monday and wednesday is twice as large as between saturday and sunday). Concerning the units of such datetime axes the documentation states

Matplotlib represents dates using floating point numbers specifying the number of days since 0001-01-01 UTC, plus 1.

Because matplotlib recognizes the datetime input, it will automatically choose a date locator and formatter as to have the ticks at useful locations

Fourth plot

The fourth plot is in principle identical to the two above. The only difference that matplotlib has no chance of knowing that the numbers (like 731000) are meant to denote dates (they could be distance between earth and satellites as well).

You can still get the same appearance as in the two plots above by manually setting a locator and formatter, e.g. adding the following lines to the last plot

loc = mdates.AutoDateLocator()
plt.gca().xaxis.set_major_locator(loc)
plt.gca().xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))

results in the same plot as the second, third plot

enter image description here

Upvotes: 1

erncyp
erncyp

Reputation: 1672

For the first graph, you are sending timestamp which is purely strings. Matplotlib treats it as strings. If you look at the plot on its own, you will see the labels are just the strings from timestamp in the same order.

The other 3 methods convert this string into timestamps, which matplotlib treats differently.

timestamp=['2019-02-04', '2019-01-15', '2018-10-08', '2018-07-09',
       '2018-04-09', '2018-02-08', '2017-09-08', '2017-09-08',
       '2017-07-07', '2017-04-07', '2017-01-09', '2016-10-07',
       '2016-07-01', '2016-03-25', '2015-12-27', '2015-09-25',
       '2015-06-26', '2015-03-27', '2014-12-24', '2014-10-06',
       '2014-07-02', '2014-03-28', '2013-12-20', '2013-09-27',
       '2013-06-11', '2013-03-27', '2012-12-27', '2012-09-26',
       '2012-06-13', '2012-03-28', '2011-12-14', '2011-09-28',
       '2011-06-14', '2011-03-30', '2010-12-15', '2010-09-29',
       '2010-06-19', '2010-03-31', '2009-12-29', '2009-09-30',
       '2009-06-17', '2009-04-01', '2008-12-20', '2008-08-25',
       '2008-08-25', '2008-06-19', '2008-03-19', '2008-03-19',
       '2006-04-11', '2005-12-27', '2005-09-28', '2005-07-02',
       '2005-04-20', '2004-12-21', '2004-10-20', '2004-07-21',
       '2003-09-22', '2003-08-20', '2002-12-31']

array=np.arange(1,len(timestamp)+1) 
plt.bar(timestamp,array)
plt.xticks(rotation='vertical')
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)

enter image description here

Upvotes: 0

Related Questions