Reputation: 319
I have a DataFrame with datetime values spanning from year 1 to way into future. When I try to import the data into pandas the dtype gets set to object
although I would like it to be datetime64 to use the .dt
accessor.
Consider this piece of code:
import pytz
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'dates': [datetime(108, 7, 30, 9, 25, 27, tzinfo=pytz.utc),
datetime(2018, 3, 20, 9, 25, 27, tzinfo=pytz.utc),
datetime(2529, 7, 30, 9, 25, 27, tzinfo=pytz.utc)]})
In [5]: df.dates
Out[5]:
0 0108-07-30 09:25:27+00:00
1 2018-03-20 09:25:27+00:00
2 2529-07-30 09:25:27+00:00
Name: dates, dtype: object
How can I convert it to dtype datetime64[s]
? I don't really care about nano/millisecond accuracy, but I would like the range.
Upvotes: 0
Views: 1160
Reputation: 365875
Pandas can generally convert to and from datetime.datetime
objects:
df.dates = pd.to_datetime(df.dates)
But in your case, you can't do this, for two reasons.
First, while Pandas can convert to and from datetime.datetime
, it can't handle tz-aware datetime
s, and you've imbued yours with a timezone. Fortunately, this one is easy to fix—you're explicitly using UTC, and you can do that without aware objects.
Second, 64-bit nanoseconds can't handle a date range as wide as you want:
>>> (1<<64) / / 1000000000 / 3600 / 24 / 365.2425
584.5540492538555
And the Pandas documentation makes this clear:
Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:
In [66]: pd.Timestamp.min
Out[66]: Timestamp('1677-09-21 00:12:43.145225')
In [67]: pd.Timestamp.max
Out[67]: Timestamp('2262-04-11 23:47:16.854775807')
(It looks like they put the 0 point at the Unix epoch, which makes sense.)
But notice that the documentation links to Representing Out-of-Bounds Spans: you can use Period
s, which will be less efficient and convenient than int64s, but probably more so than object
s. (I believe the internal storage ends up being YYYYMMDD-style strings, but they're stored as fixed-length strings directly in the array, instead of as references to Python objects on the heap.)
Upvotes: 1