wjie08
wjie08

Reputation: 445

Looking for positive values that match negative values within a column Pandas

I have some customer transaction data and would like to remove the rows where customer returned the product and got a refund.

I have a dataframe called trx which contains product_key (the product code), the gross_sales (amt spent), member_id (member identification) and trx_date (date of transaction).

I would like to obtain all the rows for each unique member where the positive value in the gross sales (e.g., 21.30 for member 22444) matches the negative value in the gross sales (e.g., -21.30 for member 22444). This will allow me to identify if the product is refunded within 14 days.

I have tried the following code but it is taking too long to compile.

product key     gross_sales      member_id      trx_date
1001            14.50            10024          20-12-2018
1001            14.50            10024          20-12-2018 
1002            21.30            22444          10-10-2018
1003            11.10            516            05-10-2018
1002           -21.30            22444          23-10-2018
1005            5.50             1800           01-09-2018
1006            8.30             4221           01-09-2018
1003           -11.10            516            06-10-2018


neg_gross_sales = trx.gross_sales[trx.gross_sales < 0]
pos_gross_sales = trx.gross_sales[trx.gross_sales > 0]

matching_vals = []

for i in trx['member_id'].unique():
    for j in trx['gross_sales']:
        if j in abs(neg_gross_sales):
            if j in abs(pos_gross_sales):
                matching_vals.append(j)
            else:
                continue
        else:
            continue
print (matching_vals)

I would like my output to look like this

product key     gross_sales      member_id      trx_date
1002            21.30            22444          10-10-2018
1002           -21.30            22444          23-10-2018
1003            11.10            516            05-10-2018
1003           -11.10            516            06-10-2018

Was wondering if there was something wrong with my code or if there was a better way to do this, appreciate any form of help, thank you!

Upvotes: 1

Views: 995

Answers (3)

moys
moys

Reputation: 8033

One more approach

a=  df.loc[df['gross_sales'].le(0)]
b = df.loc[(df.gross_sales.isin(abs(a.gross_sales))) & (df.member_id.isin(a.member_id))]
a.append(b)

output

 product key    gross_sales     member_id   trx_date
4   1002              -21.3     22444       23-10-2018
7   1003              -11.1     516         06-10-2018
2   1002               21.3     22444       10-10-2018
3   1003               11.1     516         05-10-2018

Upvotes: 0

rahlf23
rahlf23

Reputation: 9019

Here's an easy way using a boolean mask and transform():

df[df.groupby(['member_id','product key'])['gross_sales'].transform('sum').eq(0)]

Yields:

   product key  gross_sales  member_id   trx_date
2         1002         21.3      22444 2018-10-10
3         1003         11.1        516 2018-05-10
4         1002        -21.3      22444 2018-10-23
7         1003        -11.1        516 2018-06-10

From here, you can then apply your 14-day return policy window. Possibly something like this (assuming your trx_date column is datetime-type):

df[df.groupby(['member_id','product key'])['trx_date'].diff().dt.days.lt(14)]

Yields:

   product key  gross_sales  member_id   trx_date
4         1002        -21.3      22444 2018-10-23

Upvotes: 6

Chris
Chris

Reputation: 29742

One way using any with numpy.isclose:

refunded_member = df.groupby("member_id")["gross_sales"].apply(lambda x: any(np.isclose(-i, x).any() for i in x))
new_df = df.set_index('member_id')[refunded_member].reset_index()
print(new_df)

Output:

   member_id  product_key  gross_sales    trx_date
0      22444         1002         21.3  10-10-2018
1        516         1003         11.1  05-10-2018
2      22444         1002        -21.3  23-10-2018
3        516         1003        -11.1  06-10-2018

Upvotes: 1

Related Questions