ChrisArmstrong
ChrisArmstrong

Reputation: 2521

Pandas NaT's to -1?

In [22]: ts
Out[22]:
<class 'pandas.tseries.index.DatetimeIndex'>
[NaT, ..., 2012-12-31 00:00:00]
Length: 11, Freq: None, Timezone: None

In [23]: ts.year
Out[23]: array([  -1, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012])

This happens when using apply as well

ts.apply(lambda x: pd.Timestamp(x).year)

0       -1
1     2012
2     2012
3     2012
4     2012
5     2012
6     2012
7     2012
8     2012
9     2012
10    2012
Name: Dates

is it a bug that NaT.year == -1?

Upvotes: 1

Views: 7498

Answers (1)

abarnert
abarnert

Reputation: 365707

What makes you think this is a bug, rather than defined behavior?

First:

In [16]: pandas.NaT.year
Out[16]: -1

So, there's nothing odd about it being in a DatetimeIndex; that's how NaT always works.

And it's entirely internally consistent, as well as consistent with lots of other stuff in numpy and elsewhere that uses -1 as a special value for (hopefully unsigned) integral types.

Yes, -1 doesn't really work as a NaN, since you can do arithmetic with it and get non-NaN (and incorrect) results, and it does odd things in some other cases (try pandas.NaT.isoformat()), but what other option is there? As long as year is defined to be some kind of numpy integral type, it has to return an integral value. So, what are the options?

  • Return either an int or None. Then calling year returns an array(dtype=object).
  • Return a float, so NaT.year can be NaN.
  • Raise an exception for NaT.year itself, or when trying to do it within an array.
  • Return some special integer value. If not -1, what value would you use?

They all suck in different ways, but the last seems to suck least, and be the most consistent with everything else in the universe. The ideal solution might be to have integer-with-NaN types in numpy, but that's a much larger issue that designing a wrapper around numpy datetimes…

By the way, it's worth noting that numpy 1.6 doesn't have a NaT value for datetime64, so a pandas.NaT actually maps to datetime64(-1), for exactly the same reasons. Now that numpy 1.7 has np.datetime64('NaT'), that could change. But that still doesn't change the fact that integers don't have a NaN.

Upvotes: 3

Related Questions