Reputation: 2947
I'm taking the intersection of dates from two dataframes, and filtering the original dataframes (and some other stuff later, so I'm being very explicit).
from datetime import datetime
df1 = pd.DataFrame({"timestamp": [pd.Timestamp('2020-01-01 04:30:00', tz=None),
pd.Timestamp('2020-02-01 04:30:00', tz=None),
pd.Timestamp('2020-03-01 04:30:00', tz=None),
pd.Timestamp('2020-04-01 04:30:00', tz=None)
]})
df2 = pd.DataFrame({"datetime": [datetime(2020, 2, 1),
datetime(2020, 3, 1),
datetime(2020, 4, 1),
datetime(2020, 5, 1)
]})
Get the common dates:
df1_dates = df1['timestamp'].dt.date.unique()
df2_dates = df2['datetime'].dt.date.unique()
common_dates = pd.np.intersect1d(df1_dates, df2_dates)
Each date in common_dates
list has type: datetime.date(2020, 3, 1)
Using the method df.isin()
the boolean mask says no to everything.
df1['timestamp'].isin(common_dates)
0 False
1 False
2 False
3 False
Name: datetime, dtype: bool
However, comparing a single timestamp with a standard library expression x in list
, the timestamp matches:
df1['timestamp'][1] in common_dates
True
I solved my matching problem using the same conversion that made the list:
df1['datetime'].dt.date.isin(common_dates)
0 False
1 True
2 True
3 True
Name: datetime, dtype: bool
My question is, Why does Pandas DataFrame fail membership in date list when the Timestamp element indexed out of the DF is matched successfully?
Upvotes: 0
Views: 111
Reputation: 147196
The values in df1['timestamp']
have time components, so they can't match a date unless the time is 00:00:00
. If you change (for example, the second entry):
df1 = pd.DataFrame({"timestamp": [pd.Timestamp('2020-01-01 04:30:00', tz=None),
pd.Timestamp('2020-02-01 00:00:00', tz=None),
pd.Timestamp('2020-03-01 04:30:00', tz=None),
pd.Timestamp('2020-04-01 04:30:00', tz=None)
]})
Then that value will match:
print(df1['timestamp'].isin(common_dates))
Output:
0 False
1 True
2 False
3 False
Name: timestamp, dtype: bool
Upvotes: 1
Reputation: 323306
This is due the object id is different also , so if we already know the list contain datetime.date object , it is better to convert the compared value to date object as well. In you column it is Timestamp not datetime
df1['timestamp'].dt.date[0]
datetime.date(2020, 1, 1)
df1['timestamp'][0]
Timestamp('2020-01-01 04:30:00')
Ps : It will not work when you convert your list
to set
df1['timestamp'][1] in set(common_dates)
...
False
df1['timestamp'][1] in common_dates
...
True
Upvotes: 0