Owen
Owen

Reputation: 466

Why is Pandas insistent on storing times as Pandas Timestamps?

I have some code which is using datetime.datetime format throughout. I have data in X.csv:

EventDatetime  |  RunnerNumber
"20220203 1024"|  42
"20220203 1331"|  69
  ...          |  ...

which I have read into a pd.dataframe, with EventDatetime as datetime.datetime format:

dfX = pd.read_csv("X.csv", header=0, 
                  converters={"EventDatetime" : (lambda x: datetime.strptime(x, "%Y%m%d %H%M%S")}
                 )

(For reference, datetime.strptime("20220203 1024", "%Y%m%d %H%M%S") returns a datetime.datetime object).

Unfortunately, Pandas seems to convert this format into its native Timestamp format, which makes indexing very annoying.

dfX.index = dfX["EventDatetime"]
type(dfX.index.iloc[1])  # returns:  pandas._libs.tslibs.timestamps.Timestamp
dfX.loc[[datetime.datetime(2022,02,03,10,24), datetime.datetime(2022,02,03,13,31)]]

should return a dataframe with two lines, but rather throws an error because Pandas has converted the datetime.datetime object into timestamp.

Sadly it seems, Pandas has not kept to the Unix mantra of ``do one thing, and do it well''. I can find ways around this problem (various SO posts have been helpful). But what I want to know is, Why does Pandas coerce these datetimes into timestamps? Is it a bug, or a design feature which I have yet to appreciate?

Upvotes: 0

Views: 112

Answers (1)

Raymond Kwok
Raymond Kwok

Reputation: 2541

This is not really an answer to the title of your question. But I can index the dateframe with datetime.datetime object with the following code

import datetime
dfX = pd.DataFrame(
    {'EventDatetime': ['20220203 1024', '20220203 1331'],
     'RunnerNumber': ['42', '69']}
)
dfX['EventDatetime'] = pd.to_datetime(dfX['EventDatetime'], format='%Y%m%d %H%M')
dfX.index = dfX['EventDatetime']
dfX.loc[[datetime.datetime(2022,2,3,10,24), datetime.datetime(2022,2,3,13,31)]]

Output:

    RunnerNumber
EventDatetime   
2022-02-03 10:24:00 42
2022-02-03 13:31:00 69

I think the key difference is that I used pd.to_datetime for the conversion but I don't know the reason behind that difference.

Upvotes: 1

Related Questions