Getting a slice from pandas dataframe by comparing contents of df1 with df2

Question

I have two dataframes, for each id in df1 I need to pick rows from df2 with same id, filter out rows where df2.ApplicationDate < df1.ApplicationDate and count how many of such rows exist.

Here is how I am doing it currently:

for i, row in df1.iterrows():
    count = len(df2[(df2['PersonalId']==row['PersonalId']) 
                        & (df2['ApplicationDate'] < row['ApplicationDate']])
    counts.append(count)

This approach works but its hellishly slow on large dataframes, is there any way to accelerate it?

Edit: added sample input with expected output

df1:                  
   Id ApplicationDate
0   1         5-12-20
1   2         6-12-20
2   3         7-12-20
3   4         8-12-20
4   5         9-12-20
5   6        10-12-20
df2:
   Id ApplicationDate
0   1         4-11-20
1   1         4-12-20
2   3         5-12-20
3   3         8-12-20
4   5         1-12-20

expected counts for each id: [2, 0, 1, 0, 1, 0]

sammywemmy · Accepted Answer

df1.merge(df2, on="Id", how="left").assign(
    temp=lambda x: x.ApplicationDate_y.notna(),
    tempr=lambda x: x.ApplicationDate_x > x.ApplicationDate_y,
    counter=lambda x: x.temp & x.tempr,
).groupby("Id").counter.sum()

Id
1    2
2    0
3    1
4    0
5    1
6    0
Name: counter, dtype: int64

The code above merges the dataframe and then uses the sum of the conditions based on the groupby to get the count.

Getting a slice from pandas dataframe by comparing contents of df1 with df2

Answers (2)

Related Questions