Reputation: 101
I retrieve some data from my MySQL database. This data has the date (not datetime) in one column and some other random data in the other columns. Let's say dtf
is my dataframe. There is no index yet so I set one
dtf.set_index('date', inplace=True)
Now I would like to get data from a specific date, so I write for example:
dtf.loc['2000-01-03']
or just
dtf['2000-01-03']
This gives me a KeyError
:
KeyError: '2000-01-03'
But I know it's in there; dtf.head()
shows me that.
So I took a look at the type of the index of the first row:
type(dtf.index[0])
and it tells me: datetime.date
. All good. Now if I just type
dtf.index
the output is
Index([2000-01-03, 2000-01-04, 2000-01-05, 2000-01-06, 2000-01-07, 2000-01-10,
2000-01-11, 2000-01-12, 2000-01-13, 2000-01-14,
...
2015-09-09, 2015-09-10, 2015-09-11, 2015-09-14, 2015-09-15, 2015-09-16,
2015-09-17, 2015-09-18, 2015-09-21, 2015-09-22],
dtype='object', name='date', length=2763)
I am a bit confused about the dtype='object'
. Shouldn't this read datetime.date
?
If I use datetime
in my mysql table instead of date
everything works like a charm. Is this a bug or a feature? I really would like to use datetime.date
because it describes my data best.
My pandas version is 0.17.0
I am using python 3.5.0
My os is arch linux
Upvotes: 7
Views: 16500
Reputation: 23261
When you convert df.index
into dtype datetime64
using pd.to_datetime
, the type of each index, in fact, becomes type datetime.datetime
. You can verify:
import datetime
# sample data
df = pd.DataFrame({'A': range(5)}, index=pd.date_range('2000-01-01','2000-01-05', 5).date)
df.index = pd.to_datetime(df.index)
isinstance(df.index[0], datetime.datetime) # True
As Andy Hayden mentioned, once you convert the index into datetime64
, you can do the sort of indexing OP wants, such as
df.loc['2000-01-03']
# or for range of dates
df.loc['2000-01-03':'2000-01-05']
Besides, null times don't render even if the dtype is datetime64
, so visually, it's exactly the same.
That said, if you want to use datetime.date
, you can still do so by explicitly using datetime.date
. For example, to select values on 2000-01-03
, you can use either loc
or query
:
df = pd.DataFrame({'A': range(5)}, index=pd.date_range('2000-01-01','2000-01-05', 5).date)
df.loc[datetime.date(2000, 1, 3)]
# or
df.query("index == @datetime.date(2000, 1, 3)")
If you need to select a range of dates between dates, query
is very convenient (or between
works too):
date1 = datetime.date(2000, 1, 3)
date2 = datetime.date(2000, 1, 5)
df.query("@date1 <= index <= @date2")
# or
df[df.index.to_series().between(date1, date2)]
Upvotes: 0
Reputation: 375675
You should use datetime64/Timestamp rather than datetime.datetime:
dtf.index = pd.to_datetime(dtf.index)
will mean you have a DatetimeIndex and can do nifty things like loc by strings.
dtf.loc['2000-01-03']
You won't be able to do that with datetime.datetime.
Upvotes: 6