Mitchell Breitfuss
Mitchell Breitfuss

Reputation: 11

How to work around the date range limit in Pandas for plotting?

sorry if this question has been asked before but I can't seem to find one that describes my current issue.

Basically, I have a large climate dataset that is not bound to "real" dates. The dataset starts at "year one" and goes to "year 9999". These dates are stored as strings such as Jan-01, Feb-01, Mar-01 etc, where the number indicates the year. When trying to convert this column to date time objects, I get an out of range error. (My reading into this suggests this is due to a 64bit limit on the possible datetime timestamps that can exist)

What is a good way to work around this problem/process the date information so I can effectively plot the associated data vs these dates, over this ~10,000 year period?

Thanks

Upvotes: 1

Views: 608

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15442

the cftime library was created specifically for this purpose, and xarray has a convenient xr.cftime_range function that makes creating such a range easy:

In [3]: import xarray as xr, pandas as pd

In [4]: date_range = xr.cftime_range('0001-01-01', '9999-01-01', freq='D')

In [5]: type(date_range)
Out[5]: xarray.coding.cftimeindex.CFTimeIndex

This creates a CFTimeIndex object which plays nicely with pandas:


In [8]: df = pd.DataFrame({"date": date_range, "vals": range(len(date_range))})

In [9]: df
Out[9]:
                        date     vals
0        0001-01-01 00:00:00        0
1        0001-01-02 00:00:00        1
2        0001-01-03 00:00:00        2
3        0001-01-04 00:00:00        3
4        0001-01-05 00:00:00        4
...                      ...      ...
3651692  9998-12-28 00:00:00  3651692
3651693  9998-12-29 00:00:00  3651693
3651694  9998-12-30 00:00:00  3651694
3651695  9998-12-31 00:00:00  3651695
3651696  9999-01-01 00:00:00  3651696

[3651697 rows x 2 columns]

Upvotes: 1

Related Questions