Kavita
Kavita

Reputation: 125

Getting data for given day from pandas Dataframe

I have a dataframe df as below:

date1               item_id
2000-01-01 00:00:00    0
2000-01-01 10:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 12:07:00    7
2000-01-02 00:08:00    8
2000-01-02 00:00:00    0
2000-01-02 00:01:00    1
2000-01-02 03:02:00    2
2000-01-02 00:03:00    3
2000-01-02 00:04:00    4
2000-01-02 00:05:00    5
2000-01-02 04:06:00    6
2000-01-02 00:07:00    7
2000-01-02 00:08:00    8

I need the data for single day i.e. 1st Jan 2000. Below query gives me the correct result. But is there a way it can be done just by passing "2000-01-01"?

result= df[(df['date1'] > '2000-01-01 00:00') & (df['date1'] < '2000-01-01 23:59')]

Upvotes: 1

Views: 38

Answers (2)

jezrael
jezrael

Reputation: 863611

Use partial string indexing, but need DatetimeIndex first:

df = df.set_index('date1')['2000-01-01']
print (df)
                     item_id
date1                       
2000-01-01 00:00:00        0
2000-01-01 10:01:00        1
2000-01-01 00:02:00        2
2000-01-01 00:03:00        3
2000-01-01 00:04:00        4
2000-01-01 00:05:00        5
2000-01-01 00:06:00        6
2000-01-01 12:07:00        7

Another solution is convert datetimes to strings by strftime and filter by boolean indexing:

df = df[df['date1'].dt.strftime('%Y-%m-%d') == '2000-01-01']
print (df)
                date1  item_id
0 2000-01-01 00:00:00        0
1 2000-01-01 10:01:00        1
2 2000-01-01 00:02:00        2
3 2000-01-01 00:03:00        3
4 2000-01-01 00:04:00        4
5 2000-01-01 00:05:00        5
6 2000-01-01 00:06:00        6
7 2000-01-01 12:07:00        7

Upvotes: 3

Anton vBR
Anton vBR

Reputation: 18914

The other alternative would be to create a mask:

df[df.date1.dt.date.astype(str) == '2000-01-01']

Full example:

import pandas as pd

data = '''\
date1                  item_id
2000-01-01T00:00:00    0
2000-01-01T10:01:00    1
2000-01-01T00:02:00    2
2000-01-01T00:03:00    3
2000-01-01T00:04:00    4
2000-01-01T00:05:00    5
2000-01-01T00:06:00    6
2000-01-01T12:07:00    7
2000-01-02T00:08:00    8
2000-01-02T00:00:00    0
2000-01-02T00:01:00    1
2000-01-02T03:02:00    2'''

df = pd.read_csv(pd.compat.StringIO(data), sep='\s+', parse_dates=['date1'])

res = df[df.date1.dt.date.astype(str) == '2000-01-01']
print(res)

Returns:

                date1  item_id
0 2000-01-01 00:00:00        0
1 2000-01-01 10:01:00        1
2 2000-01-01 00:02:00        2
3 2000-01-01 00:03:00        3
4 2000-01-01 00:04:00        4
5 2000-01-01 00:05:00        5
6 2000-01-01 00:06:00        6
7 2000-01-01 12:07:00        7

Or

import datetime
df[df.date1.dt.date == datetime.date(2000,1,1)]

Upvotes: 2

Related Questions