Ishan Dutta
Ishan Dutta

Reputation: 957

How to compare 2 Dataframes of unequal length based on Datetime column

I have 2 Dataframes and I want to remove the rows from badges dataframe who have the value of Date greater than the respective UserId's date field from test_df dataframe. The problem is that both dataframes have multiple values for UserId and the total number of rows of both dataframes are unequal. The code that I used works only if both dataframes have equal number of rows, otherwise it produces an error.

Part of badges

UserId | Date
101    | 2009-09-01 15:17:50.660
101    | 2009-09-01 15:17:50.660
101    | 2009-09-02 15:17:50.660
102    | 2009-09-03 15:17:50.660
103    | 2013-09-30 15:17:50.660
103    | 2013-09-30 15:17:50.660
104    | 2009-09-30 15:17:50.660

Part of test_df

UserId | date
101    | 2010-09-01 15:17:50.660
101    | 2010-09-02 15:17:50.660
101    | 2010-09-03 15:17:50.660
102    | 2010-09-30 15:17:50.660
103    | 2010-09-30 15:17:50.660
103    | 2010-09-30 15:17:50.660

Code

badges = badges[(badges.Date < test_df.date)]

Error

ValueError: Can only compare identically-labeled Series objects

Required Output

badges

UserId     | Date
    101    | 2009-09-01 15:17:50.660
    101    | 2009-09-01 15:17:50.660
    101    | 2009-09-02 15:17:50.660
    102    | 2009-09-03 15:17:50.660
    104    | 2009-09-30 15:17:50.660

Upvotes: 1

Views: 159

Answers (1)

sammywemmy
sammywemmy

Reputation: 28669

Convert the dates to datetime :

badges["Date"] = pd.to_datetime(badges["Date"])
test_df["date"] = pd.to_datetime(test_df["date"])

Iterate to get values where badges date is less than test_df date : from itertools import product

valid_dates = {
    badge_date
    for badge_date, test_df_date in product(badges.Date.unique(), test_df.date.unique())
    if badge_date < test_df_date
}

Filter badges with valid_dates :

badges.loc[badges["Date"].isin(valid_dates)]

    UserId  Date
0   101     2009-09-01 15:17:50.660
1   101     2009-09-01 15:17:50.660
2   101     2009-09-02 15:17:50.660
3   102     2009-09-03 15:17:50.660
6   104     2009-09-30 15:17:50.660

Upvotes: 1

Related Questions