Reputation: 163
I am looking to automate the reconciliation of bank transactions. There are 2 tables, the bank table & the system table, whereby the transactions in the system table are delayed by few days. The tables vary in length and do not have 1:1 match for transactions.
The problem is to find a consistent method to identify and group reconciling and non-reconciling transactions both ways. The first challenge I have encountered is to join/merge tables based on date range & the amount. Pandas.merge_asof would be suitable to join based on date range, but it is limited to 1 column-based join.
See example tables below:
bankdf = pd.DataFrame({'BankDate': pd.date_range('2018-12-28', periods=10, freq='3D'), 'Amount': np.array([140,107,132,188,75,152,88,159,132,107])})
systemdf = pd.DataFrame({'SystemCreditDate': pd.date_range('2019-01-04', periods=9, freq='3D'), 'Amount': np.array([107,132,190,75,152,88,110,132,132])})
bankdf
Out[119]:
Amount BankDate
0 140 2018-12-28
1 107 2018-12-31
2 132 2019-01-03
3 188 2019-01-06
4 75 2019-01-09
5 152 2019-01-12
6 88 2019-01-15
7 159 2019-01-18
8 132 2019-01-21
9 107 2019-01-24
systemdf
Out[120]:
Amount SystemCreditDate
0 107 2019-01-04
1 132 2019-01-07
2 190 2019-01-10
3 75 2019-01-13
4 152 2019-01-16
5 88 2019-01-19
6 110 2019-01-22
7 132 2019-01-25
8 132 2019-01-28
The 2 tables will need to be joined based on where 'Amount' matches AND date difference is less than 6 days (SystemCreditDate - BankDate) < 6).
The final result should look something like this:
Amount BankDate SystemCreditDate
1 107 2018-12-31 2019-01-04
2 132 2019-01-03 2019-01-07
3 75 2019-01-09 2019-01-13
4 152 2019-01-12 2019-01-16
5 88 2019-01-15 2019-01-19
6 132 2019-01-21 2019-01-25
Upvotes: 1
Views: 283
Reputation: 13255
Use DataFrame.merge
and remove the rows which don't follow the rule:
df = bankdf.merge(systemdf)
mask = (df['SystemCreditDate']-df['BankDate']).abs().dt.days<6
df = df.loc[mask, :]
print(df)
BankDate Amount SystemCreditDate
0 2018-12-31 107 2019-01-04
2 2019-01-03 132 2019-01-07
6 2019-01-21 132 2019-01-25
8 2019-01-09 75 2019-01-13
9 2019-01-12 152 2019-01-16
10 2019-01-15 88 2019-01-19
OR to remove negative days:
df = bankdf.merge(systemdf)
mask = (df['SystemCreditDate']-df['BankDate']).dt.days
mask = mask.le(6) & ~mask.lt(0)
df = df.loc[mask, :]
Upvotes: 2