Reputation: 466
I have some code which is using datetime.datetime
format throughout. I have data in X.csv
:
EventDatetime | RunnerNumber
"20220203 1024"| 42
"20220203 1331"| 69
... | ...
which I have read into a pd.dataframe
, with EventDatetime
as datetime.datetime
format:
dfX = pd.read_csv("X.csv", header=0,
converters={"EventDatetime" : (lambda x: datetime.strptime(x, "%Y%m%d %H%M%S")}
)
(For reference, datetime.strptime("20220203 1024", "%Y%m%d %H%M%S")
returns a datetime.datetime
object).
Unfortunately, Pandas seems to convert this format into its native Timestamp
format, which makes indexing very annoying.
dfX.index = dfX["EventDatetime"]
type(dfX.index.iloc[1]) # returns: pandas._libs.tslibs.timestamps.Timestamp
dfX.loc[[datetime.datetime(2022,02,03,10,24), datetime.datetime(2022,02,03,13,31)]]
should return a dataframe with two lines, but rather throws an error because Pandas has converted the datetime.datetime
object into timestamp
.
Sadly it seems, Pandas has not kept to the Unix mantra of ``do one thing, and do it well''. I can find ways around this problem (various SO posts have been helpful). But what I want to know is, Why does Pandas coerce these datetimes
into timestamps
? Is it a bug, or a design feature which I have yet to appreciate?
Upvotes: 0
Views: 112
Reputation: 2541
This is not really an answer to the title of your question. But I can index the dateframe with datetime.datetime
object with the following code
import datetime
dfX = pd.DataFrame(
{'EventDatetime': ['20220203 1024', '20220203 1331'],
'RunnerNumber': ['42', '69']}
)
dfX['EventDatetime'] = pd.to_datetime(dfX['EventDatetime'], format='%Y%m%d %H%M')
dfX.index = dfX['EventDatetime']
dfX.loc[[datetime.datetime(2022,2,3,10,24), datetime.datetime(2022,2,3,13,31)]]
Output:
RunnerNumber
EventDatetime
2022-02-03 10:24:00 42
2022-02-03 13:31:00 69
I think the key difference is that I used pd.to_datetime
for the conversion but I don't know the reason behind that difference.
Upvotes: 1