Reputation: 957
I have 2 Dataframes and I want to remove the rows from badges
dataframe who have the value of Date
greater than the respective UserId's
date
field from test_df
dataframe. The problem is that both dataframes have multiple values for UserId
and the total number of rows of both dataframes are unequal. The code that I used works only if both dataframes have equal number of rows, otherwise it produces an error.
Part of badges
UserId | Date
101 | 2009-09-01 15:17:50.660
101 | 2009-09-01 15:17:50.660
101 | 2009-09-02 15:17:50.660
102 | 2009-09-03 15:17:50.660
103 | 2013-09-30 15:17:50.660
103 | 2013-09-30 15:17:50.660
104 | 2009-09-30 15:17:50.660
Part of test_df
UserId | date
101 | 2010-09-01 15:17:50.660
101 | 2010-09-02 15:17:50.660
101 | 2010-09-03 15:17:50.660
102 | 2010-09-30 15:17:50.660
103 | 2010-09-30 15:17:50.660
103 | 2010-09-30 15:17:50.660
Code
badges = badges[(badges.Date < test_df.date)]
Error
ValueError: Can only compare identically-labeled Series objects
Required Output
badges
UserId | Date
101 | 2009-09-01 15:17:50.660
101 | 2009-09-01 15:17:50.660
101 | 2009-09-02 15:17:50.660
102 | 2009-09-03 15:17:50.660
104 | 2009-09-30 15:17:50.660
Upvotes: 1
Views: 159
Reputation: 28669
Convert the dates to datetime :
badges["Date"] = pd.to_datetime(badges["Date"])
test_df["date"] = pd.to_datetime(test_df["date"])
Iterate to get values where badges date is less than test_df date : from itertools import product
valid_dates = {
badge_date
for badge_date, test_df_date in product(badges.Date.unique(), test_df.date.unique())
if badge_date < test_df_date
}
Filter badges with valid_dates :
badges.loc[badges["Date"].isin(valid_dates)]
UserId Date
0 101 2009-09-01 15:17:50.660
1 101 2009-09-01 15:17:50.660
2 101 2009-09-02 15:17:50.660
3 102 2009-09-03 15:17:50.660
6 104 2009-09-30 15:17:50.660
Upvotes: 1