Reputation: 2521
In [22]: ts
Out[22]:
<class 'pandas.tseries.index.DatetimeIndex'>
[NaT, ..., 2012-12-31 00:00:00]
Length: 11, Freq: None, Timezone: None
In [23]: ts.year
Out[23]: array([ -1, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012])
This happens when using apply as well
ts.apply(lambda x: pd.Timestamp(x).year)
0 -1
1 2012
2 2012
3 2012
4 2012
5 2012
6 2012
7 2012
8 2012
9 2012
10 2012
Name: Dates
is it a bug that NaT.year == -1?
Upvotes: 1
Views: 7498
Reputation: 365707
What makes you think this is a bug, rather than defined behavior?
First:
In [16]: pandas.NaT.year
Out[16]: -1
So, there's nothing odd about it being in a DatetimeIndex
; that's how NaT
always works.
And it's entirely internally consistent, as well as consistent with lots of other stuff in numpy
and elsewhere that uses -1 as a special value for (hopefully unsigned) integral types.
Yes, -1 doesn't really work as a NaN, since you can do arithmetic with it and get non-NaN (and incorrect) results, and it does odd things in some other cases (try pandas.NaT.isoformat()
), but what other option is there? As long as year
is defined to be some kind of numpy
integral type, it has to return an integral value. So, what are the options?
int
or None
. Then calling year
returns an array(dtype=object)
.NaT.year
can be NaN
.NaT.year
itself, or when trying to do it within an array
.They all suck in different ways, but the last seems to suck least, and be the most consistent with everything else in the universe. The ideal solution might be to have integer-with-NaN types in numpy
, but that's a much larger issue that designing a wrapper around numpy
datetime
s…
By the way, it's worth noting that numpy
1.6 doesn't have a NaT value for datetime64
, so a pandas.NaT
actually maps to datetime64(-1)
, for exactly the same reasons. Now that numpy
1.7 has np.datetime64('NaT')
, that could change. But that still doesn't change the fact that integers don't have a NaN.
Upvotes: 3