Daniel
Daniel

Reputation: 101

Indexing a pandas dataframe with datetime.date index leads to KeyError

I retrieve some data from my MySQL database. This data has the date (not datetime) in one column and some other random data in the other columns. Let's say dtf is my dataframe. There is no index yet so I set one

dtf.set_index('date', inplace=True)

Now I would like to get data from a specific date, so I write for example:

dtf.loc['2000-01-03']

or just

dtf['2000-01-03']

This gives me a KeyError:

KeyError: '2000-01-03'

But I know it's in there; dtf.head() shows me that.
So I took a look at the type of the index of the first row:

type(dtf.index[0])

and it tells me: datetime.date. All good. Now if I just type

dtf.index

the output is

Index([2000-01-03, 2000-01-04, 2000-01-05, 2000-01-06, 2000-01-07, 2000-01-10,
       2000-01-11, 2000-01-12, 2000-01-13, 2000-01-14,
       ...
       2015-09-09, 2015-09-10, 2015-09-11, 2015-09-14, 2015-09-15, 2015-09-16,
       2015-09-17, 2015-09-18, 2015-09-21, 2015-09-22],
       dtype='object', name='date', length=2763)

I am a bit confused about the dtype='object'. Shouldn't this read datetime.date?

If I use datetime in my mysql table instead of date everything works like a charm. Is this a bug or a feature? I really would like to use datetime.date because it describes my data best.

My pandas version is 0.17.0
I am using python 3.5.0
My os is arch linux

Upvotes: 7

Views: 16500

Answers (2)

cottontail
cottontail

Reputation: 23261

When you convert df.index into dtype datetime64 using pd.to_datetime, the type of each index, in fact, becomes type datetime.datetime. You can verify:

import datetime
# sample data
df = pd.DataFrame({'A': range(5)}, index=pd.date_range('2000-01-01','2000-01-05', 5).date) 

df.index = pd.to_datetime(df.index)
isinstance(df.index[0], datetime.datetime)       # True

As Andy Hayden mentioned, once you convert the index into datetime64, you can do the sort of indexing OP wants, such as

df.loc['2000-01-03']
# or for range of dates
df.loc['2000-01-03':'2000-01-05']

Besides, null times don't render even if the dtype is datetime64, so visually, it's exactly the same.

That said, if you want to use datetime.date, you can still do so by explicitly using datetime.date. For example, to select values on 2000-01-03, you can use either loc or query:

df = pd.DataFrame({'A': range(5)}, index=pd.date_range('2000-01-01','2000-01-05', 5).date) 

df.loc[datetime.date(2000, 1, 3)]
# or
df.query("index == @datetime.date(2000, 1, 3)")

If you need to select a range of dates between dates, query is very convenient (or between works too):

date1 = datetime.date(2000, 1, 3)
date2 = datetime.date(2000, 1, 5)

df.query("@date1 <= index <= @date2")
# or
df[df.index.to_series().between(date1, date2)]

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375675

You should use datetime64/Timestamp rather than datetime.datetime:

dtf.index = pd.to_datetime(dtf.index)

will mean you have a DatetimeIndex and can do nifty things like loc by strings.

dtf.loc['2000-01-03']

You won't be able to do that with datetime.datetime.

Upvotes: 6

Related Questions