Reputation: 931
I have a DataFrame with a single day
column:
| | day |
|----:|:--------------------------|
| 0 | 2021-08-28 00:00:00+00:00 |
| 1 | 2021-08-28 02:00:00+00:00 |
| 2 | 2021-08-28 04:00:00+00:00 |
| ... | ... |
| n | 2021-08-28 16:00:00+00:00 |
>>> df.dtypes
day datetime64[ns, UTC]
dtype: object
I noticed pandas returns different date data-types when sampling and indexing and have to be converted to be compared.
>>> df.day[0]
Timestamp('2021-08-28 00:00:00+0000', tz='UTC')
>>> type(df.day[0])
pandas._libs.tslibs.timestamps.Timestamp
>>> df.day.sample(1).values[0]
numpy.datetime64('2021-09-04T12:00:00.000000000')
>>> type(df.day.sample(1).values[0])
numpy.datetime64
What's going on? Why does pandas use different data-types in the two scenarios?
Upvotes: 0
Views: 561
Reputation: 1481
Pandas stores datetimes as numpy's underlying datetime64
type. The reason (as opposed to storing as a Timestamp
, which is a datetime.datetime
subclass) is simple - performance. When retrieving a particular value though, it returns a Timestamp
object, which is more convenient to work with since it support all datetime.datetime
methods.
Upvotes: 1