ely
ely

Reputation: 77454

Pandas gives incorrect result when asking if Timestamp column values have attr astype

With a column containing Timestamp values, I am getting inconsistent results about whether the elements have the attribute astype:

In [30]: o.head().datetime.map(lambda x: hasattr(x, 'astype'))
Out[30]: 
0    False
1    False
2    False
3    False
4    False
Name: datetime, dtype: bool

In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]

In [32]: o.datetime.dtype
Out[32]: dtype('<M8[ns]')

In [33]: o.datetime.head()
Out[33]: 
0   2012-09-30 22:00:15.003000
1   2012-09-30 22:00:16.203000
2   2012-09-30 22:00:18.302000
3   2012-09-30 22:03:37.304000
4   2012-09-30 22:05:17.103000
Name: datetime, dtype: datetime64[ns]

If I pick off the first element (or any single element) and ask if it has attr astype, I see that it does, and I even can convert to other formats.

But if I type to do this to the entire column in one go, with Series.map, I get an error claiming that Timestamp objects do not have the attribute astype (though they clearly do).

How can I achieve mapping the operation to the column with Pandas? Is this a known error?

Version: pandas 0.13.0, numpy 1.8

Added

It appears to be some sort of implicit casting on the part of either pandas or numpy:

In [50]: hasattr(o.head().datetime[0], 'astype')
Out[50]: False

In [51]: hasattr(o.head().datetime.values[0], 'astype')
Out[51]: True

Upvotes: 1

Views: 1108

Answers (1)

unutbu
unutbu

Reputation: 879939

Timestamps do not have an astype method. But numpy.datetime64's do.

NDFrame.values returns a numpy array. o.head().datetime.values returns a numpy array of dtype numpy.datetime64, which is why

In [31]: map(lambda x: hasattr(x, 'astype'), o.head().datetime.values)
Out[31]: [True, True, True, True, True]

Note that Series.__iter__ is defined this way:

def __iter__(self):
    if  com.is_categorical_dtype(self.dtype):
        return iter(self.values)
    elif np.issubdtype(self.dtype, np.datetime64):
        return (lib.Timestamp(x) for x in self.values)
    elif np.issubdtype(self.dtype, np.timedelta64):
        return (lib.Timedelta(x) for x in self.values)
    else:
        return iter(self.values)

So, when the dtype of the Series is np.datetime64, iteration over the Series returns Timestamps. This is where the implicit conversion takes place.

Upvotes: 2

Related Questions