user7786493
user7786493

Reputation: 473

Pandas: Calling df.loc[] from an index consisting of pd.datetime

Say I have a df as follows:

a=pd.DataFrame([[1,3]]*3,columns=['a','b'],index=['5/4/2017','5/6/2017','5/8/2017'])    
a.index=pd.to_datetime(a.index,format='%m/%d/%Y')

The type of of the df.index is now

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

When we try to call a row of data based on the index of type pd.datetime, it is possible to call the value based on a string format of date instead of inputting a datetime object. In the above case, if I want to call a row of data on 5/4/2017, I can simply input the string format of the date to .loc as follows:

print(a.loc['5/4/2017'])

And we do not need to input the datetime object

print(a.loc[pd.datetime(2017,5,4)]

My question is, when calling the data from .loc based on string format of date, how does pandas know if my date string format follows m-d-y or d-m-y or other combinations? In this above case, I used a.loc['5/4/2017'] and it succeeds in returning the value. Why wouldn't it think it might mean April 5 which is not within this index?

Upvotes: 2

Views: 1486

Answers (1)

Steven Walton
Steven Walton

Reputation: 406

Here's my best shot:

Pandas has an internal function called pandas._guess_datetime_format. This is what gets called when passing the 'infer_datetime_format' argument to pandas.to_datetime. It takes a string and runs through a list of "guess" formats and returns its best guess on how to convert that string to a datetime object.

Referencing a datetime index with a string may use a similar approach.

I did some testing to see what would happen in the case you described - where a dataframe contains both the date 2017-04-05 and 2017-05-04.

In this case, the following:

df.loc['5/4/2017']

Returned the Data for May 4th, 2017

df.loc['4/5/2017']

Returned the data for April 5th, 2017.

Attempting to reference 4/5/2017 in your original matrix gave an "is not in the [index]" error.

Based on this, my conclusion is that pandas._guess_datetime_format defaults to a "%m/%d/%Y" format in cases where it cannot be distinguished from "%d/%m/%Y". This is the standard date format in the US.

Upvotes: 1

Related Questions