Reputation: 445
I have some customer transaction data and would like to remove the rows where customer returned the product and got a refund.
I have a dataframe called trx which contains product_key (the product code), the gross_sales (amt spent), member_id (member identification) and trx_date (date of transaction).
I would like to obtain all the rows for each unique member where the positive value in the gross sales (e.g., 21.30 for member 22444) matches the negative value in the gross sales (e.g., -21.30 for member 22444). This will allow me to identify if the product is refunded within 14 days.
I have tried the following code but it is taking too long to compile.
product key gross_sales member_id trx_date
1001 14.50 10024 20-12-2018
1001 14.50 10024 20-12-2018
1002 21.30 22444 10-10-2018
1003 11.10 516 05-10-2018
1002 -21.30 22444 23-10-2018
1005 5.50 1800 01-09-2018
1006 8.30 4221 01-09-2018
1003 -11.10 516 06-10-2018
neg_gross_sales = trx.gross_sales[trx.gross_sales < 0]
pos_gross_sales = trx.gross_sales[trx.gross_sales > 0]
matching_vals = []
for i in trx['member_id'].unique():
for j in trx['gross_sales']:
if j in abs(neg_gross_sales):
if j in abs(pos_gross_sales):
matching_vals.append(j)
else:
continue
else:
continue
print (matching_vals)
I would like my output to look like this
product key gross_sales member_id trx_date
1002 21.30 22444 10-10-2018
1002 -21.30 22444 23-10-2018
1003 11.10 516 05-10-2018
1003 -11.10 516 06-10-2018
Was wondering if there was something wrong with my code or if there was a better way to do this, appreciate any form of help, thank you!
Upvotes: 1
Views: 995
Reputation: 8033
One more approach
a= df.loc[df['gross_sales'].le(0)]
b = df.loc[(df.gross_sales.isin(abs(a.gross_sales))) & (df.member_id.isin(a.member_id))]
a.append(b)
output
product key gross_sales member_id trx_date
4 1002 -21.3 22444 23-10-2018
7 1003 -11.1 516 06-10-2018
2 1002 21.3 22444 10-10-2018
3 1003 11.1 516 05-10-2018
Upvotes: 0
Reputation: 9019
Here's an easy way using a boolean mask and transform()
:
df[df.groupby(['member_id','product key'])['gross_sales'].transform('sum').eq(0)]
Yields:
product key gross_sales member_id trx_date
2 1002 21.3 22444 2018-10-10
3 1003 11.1 516 2018-05-10
4 1002 -21.3 22444 2018-10-23
7 1003 -11.1 516 2018-06-10
From here, you can then apply your 14-day return policy window. Possibly something like this (assuming your trx_date
column is datetime
-type):
df[df.groupby(['member_id','product key'])['trx_date'].diff().dt.days.lt(14)]
Yields:
product key gross_sales member_id trx_date
4 1002 -21.3 22444 2018-10-23
Upvotes: 6
Reputation: 29742
One way using any
with numpy.isclose
:
refunded_member = df.groupby("member_id")["gross_sales"].apply(lambda x: any(np.isclose(-i, x).any() for i in x))
new_df = df.set_index('member_id')[refunded_member].reset_index()
print(new_df)
Output:
member_id product_key gross_sales trx_date
0 22444 1002 21.3 10-10-2018
1 516 1003 11.1 05-10-2018
2 22444 1002 -21.3 23-10-2018
3 516 1003 -11.1 06-10-2018
Upvotes: 1