Why is Pandas insistent on storing times as Pandas Timestamps?

Question

I have some code which is using datetime.datetime format throughout. I have data in X.csv:

EventDatetime  |  RunnerNumber
"20220203 1024"|  42
"20220203 1331"|  69
  ...          |  ...

which I have read into a pd.dataframe, with EventDatetime as datetime.datetime format:

dfX = pd.read_csv("X.csv", header=0, 
                  converters={"EventDatetime" : (lambda x: datetime.strptime(x, "%Y%m%d %H%M%S")}
                 )

(For reference, datetime.strptime("20220203 1024", "%Y%m%d %H%M%S") returns a datetime.datetime object).

Unfortunately, Pandas seems to convert this format into its native Timestamp format, which makes indexing very annoying.

dfX.index = dfX["EventDatetime"]
type(dfX.index.iloc[1])  # returns:  pandas._libs.tslibs.timestamps.Timestamp
dfX.loc[[datetime.datetime(2022,02,03,10,24), datetime.datetime(2022,02,03,13,31)]]

should return a dataframe with two lines, but rather throws an error because Pandas has converted the datetime.datetime object into timestamp.

Sadly it seems, Pandas has not kept to the Unix mantra of ``do one thing, and do it well''. I can find ways around this problem (various SO posts have been helpful). But what I want to know is, Why does Pandas coerce these datetimes into timestamps? Is it a bug, or a design feature which I have yet to appreciate?

Why is Pandas insistent on storing times as Pandas Timestamps?

Answers (1)

Related Questions