nilsinelabore
nilsinelabore

Reputation: 5115

Slicing dataframe by date returned nothing in Python

I have dataframe df:

         id         timestamp           data group_id   date
56729   56970   2020-02-01 01:22:52.717 21.0    1   2020-02-01
57135   57376   2020-02-01 14:11:22.633 38.0    3   2020-02-01
57136   57377   2020-02-01 14:11:22.733 39.0    3   2020-02-01
57137   57378   2020-02-01 14:11:23.637 39.0    3   2020-02-01
57138   57379   2020-02-01 14:11:23.737 40.0    3   2020-02-01

and code:

df = df[df['data'] >0]
df['timestamp'] = pd.to_datetime(df['timestamp'])

start_date = pd.to_datetime('2020-02-01 00:00:00')
end_date = pd.to_datetime('2020-03-01 00:00:00')

df = df.loc[(df['timestamp'] > start_date) & (df['timestamp'] < end_date)]

df['date'] = df['timestamp'].dt.date
df = df.sort_values(by=['date'])
df = df[df['date'] == '2020-02-01']

Column date was created based on datetime so that I can group the df by date later on. But the code returned nothing when I sliced df by a certain date, say 2020-02-01, where there is data for that day. The output looks lie this:

    id  timestamp   data    group_id    date

which is only the column names. What is wrong?

Upvotes: 0

Views: 47

Answers (2)

Chetan Ameta
Chetan Ameta

Reputation: 7896

Your df['date'] date object type data, while you are comparing it with string on line df = df[df['date'] == '2020-02-01']. Have a look on below solution:

import pandas as pd

dic = {'timestamp': ['2020-02-01 01:22:52.717', '2020-02-01 01:24:52.717', '2020-02-02 01:22:52.717',
                     '2020-02-03 01:22:52.717']}

df = pd.DataFrame(dic)

df['timestamp'] = pd.to_datetime(df['timestamp'])
print(df['timestamp'])


start_date = pd.to_datetime('2020-02-01 00:00:00')
end_date = pd.to_datetime('2020-03-01 00:00:00')

df = df.loc[(df['timestamp'] > start_date) & (df['timestamp'] < end_date)]

df['date'] = df['timestamp'].dt.date
print(df['date'])
df = df.sort_values(by=['date'])
df = df[df['date'] == pd.to_datetime('2020-02-01')]

print(df)

Output:

0   2020-02-01 01:22:52.717
1   2020-02-01 01:24:52.717
2   2020-02-02 01:22:52.717
3   2020-02-03 01:22:52.717
Name: timestamp, dtype: datetime64[ns]
0    2020-02-01
1    2020-02-01
2    2020-02-02
3    2020-02-03
Name: date, dtype: object
                timestamp        date
0 2020-02-01 01:22:52.717  2020-02-01
1 2020-02-01 01:24:52.717  2020-02-01

Upvotes: 1

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

Your df[date] columns contains datetime like values, not string, so those will not be equal to '2020-02-01', you can either do:

>>> df[df['date'] == pd.to_datetime('2020-02-01')]

Or,

>>> df[df['date'].astype(str) == '2020-02-01']

Upvotes: 1

Related Questions