xtian
xtian

Reputation: 2947

Pandas Timestamp column fails membership in list of dates when element expression is good?

I'm taking the intersection of dates from two dataframes, and filtering the original dataframes (and some other stuff later, so I'm being very explicit).

from datetime import datetime

df1 = pd.DataFrame({"timestamp": [pd.Timestamp('2020-01-01 04:30:00', tz=None),
                               pd.Timestamp('2020-02-01 04:30:00', tz=None),
                               pd.Timestamp('2020-03-01 04:30:00', tz=None),
                               pd.Timestamp('2020-04-01 04:30:00', tz=None)                               
                               ]})

df2 = pd.DataFrame({"datetime": [datetime(2020, 2, 1),
                                 datetime(2020, 3, 1),
                                 datetime(2020, 4, 1),       
                                 datetime(2020, 5, 1)
                               ]})

Get the common dates:

df1_dates = df1['timestamp'].dt.date.unique()
df2_dates = df2['datetime'].dt.date.unique()
common_dates = pd.np.intersect1d(df1_dates, df2_dates)

Each date in common_dates list has type: datetime.date(2020, 3, 1)

Using the method df.isin() the boolean mask says no to everything.

df1['timestamp'].isin(common_dates)
0    False
1    False
2    False
3    False
Name: datetime, dtype: bool

However, comparing a single timestamp with a standard library expression x in list, the timestamp matches:

df1['timestamp'][1] in common_dates
True

I solved my matching problem using the same conversion that made the list:

df1['datetime'].dt.date.isin(common_dates)
0    False
1     True
2     True
3     True
Name: datetime, dtype: bool

My question is, Why does Pandas DataFrame fail membership in date list when the Timestamp element indexed out of the DF is matched successfully?

Upvotes: 0

Views: 111

Answers (2)

Nick
Nick

Reputation: 147196

The values in df1['timestamp'] have time components, so they can't match a date unless the time is 00:00:00. If you change (for example, the second entry):

df1 = pd.DataFrame({"timestamp": [pd.Timestamp('2020-01-01 04:30:00', tz=None),
                               pd.Timestamp('2020-02-01 00:00:00', tz=None),
                               pd.Timestamp('2020-03-01 04:30:00', tz=None),
                               pd.Timestamp('2020-04-01 04:30:00', tz=None)                               
                               ]})

Then that value will match:

print(df1['timestamp'].isin(common_dates))

Output:

0    False
1     True
2    False
3    False
Name: timestamp, dtype: bool

Upvotes: 1

BENY
BENY

Reputation: 323306

This is due the object id is different also , so if we already know the list contain datetime.date object , it is better to convert the compared value to date object as well. In you column it is Timestamp not datetime

df1['timestamp'].dt.date[0]
datetime.date(2020, 1, 1)
df1['timestamp'][0]
Timestamp('2020-01-01 04:30:00')

Ps : It will not work when you convert your list to set

df1['timestamp'][1] in set(common_dates)
... 
False
df1['timestamp'][1] in common_dates
... 
True 

Upvotes: 0

Related Questions