user3591836
user3591836

Reputation: 993

Pandas get date from datetime stamp

I'm working with a pandas data frame where the 'date_time' column has values that look like datetime stamps: 2014-02-21 17:16:42

I can call that column using df['date_time'], and I want to search for rows with a particular date. I've been trying something along the lines of

df[(df['date_time']=='2014-02-21')]

but I don't know how to just search for date from the datetime value. Also, I'm not sure if it's relevant, but when I check type(df.date_time[0]) it returns string, instead of some datetime type object.

Thanks a lot.

Upvotes: 3

Views: 2859

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375485

It is much more efficient not to use the strings here (assuming these are already datetime64 - which you should be!), as these have to be calculated before comparing... and string stuff is slow.

In [11]: s = pd.Series(pd.to_datetime(['2014-02-21 17:16:42', '2014-02-22 17:16:42']))

In [12]: s
Out[12]:
0   2014-02-21 17:16:42
1   2014-02-22 17:16:42
dtype: datetime64[ns]

You can either just do a simple ordering check:

In [13]: (pd.Timestamp('2014-02-21') < s) & (s < pd.Timestamp('2014-02-22'))
Out[13]:
0     True
1    False
dtype: bool

In [14]: s.loc[(pd.Timestamp('2014-02-21') < s) & (s < pd.Timestamp('2014-02-22'))]
Out[14]:
0   2014-02-21 17:16:42
dtype: datetime64[ns]

However, it's faster to use DatetimeIndex.normalize (which gets the Timestamp at midnight of each Timestamp):

In [15]: pd.DatetimeIndex(s).normalize()
Out[15]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-02-21, 2014-02-22]
Length: 2, Freq: None, Timezone: None

In [16]: pd.DatetimeIndex(s).normalize() == pd.Timestamp('2014-02-21')
Out[16]: array([ True, False], dtype=bool)

In [17]: s.loc[pd.DatetimeIndex(s).normalize() == pd.Timestamp('2014-02-21')]
Out[17]:
0   2014-02-21 17:16:42
dtype: datetime64[ns]

Here's some timing (s as above):

In [21]: %timeit s.loc[s.str.startswith('2014-02-21')]
1000 loops, best of 3: 1.16 ms per loop

In [22]: %timeit s.loc[(pd.Timestamp('2014-02-21') < s) & (s < pd.Timestamp('2014-02-22'))]
1000 loops, best of 3: 1.23 ms per loop

In [23]: %timeit s.loc[pd.DatetimeIndex(s).normalize() == pd.Timestamp('2014-02-21')]
1000 loops, best of 3: 405 µs per loop

with a slightly larger s the results are more telling:

In [31]: s = pd.Series(pd.to_datetime(['2014-02-21 17:16:42', '2014-02-22 17:16:42'] * 1000))

In [32]: %timeit s.loc[s.str.startswith('2014-02-21')]
10 loops, best of 3: 105 ms per loop

In [33]: %timeit s.loc[(pd.Timestamp('2014-02-21') < s) & (s < pd.Timestamp('2014-02-22'))]
1000 loops, best of 3: 1.3 ms per loop

In [34]: %timeit s.loc[pd.DatetimeIndex(s).normalize() == pd.Timestamp('2014-02-21')]
1000 loops, best of 3: 694 µs per loop

Note: In your example the column df['date_time'] is s, and you would be doing df.loc[pd.DatetimeIndex(df['date_time']) == ...].

Upvotes: 3

Narek
Narek

Reputation: 616

Since it's a sting you cat try something along the lines of:

df[df['date_time'].str.startswith('2014-02-21')]

Upvotes: 0

Related Questions