Reputation: 115
I have a DF in which I'm analyzing customer total price. I'm finding the total price a customer paid by summing discount (since it is already a negative number) from sales value of the product.
transaction_df_clean['customer_price'] = transaction_df_clean['sales_value'] + transaction_df_clean['coupon_disc']
Consequently, some total prices are negative and I want to change these to 0 to avoid negative numbers.
Input:
transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0].head(10)
Output (showing 1 row):
index household_key basket_id day product_id quantity sales_value store_id retail_disc trans_time week_no coupon_disc coupon_match_disc customer_price
----------
13895 988 27282152470 25 1088634 2 1.00 408 -0.98 2353 4 -1.49 0.00 -0.49
However, when trying to change the negative values to 0 in the "customer_price" columns, other columns that are not being targeted change to 0 too.
Input:
transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0] = 0
transaction_df_clean.loc[transaction_df_clean['customer_price'] == 0].head(20)
Output:
index household_key basket_id day product_id quantity sales_value store_id retail_disc trans_time week_no coupon_disc coupon_match_disc customer_price
----------
13895 0 0 0 0 0 0.00 0 0.00 0 0 0.00 0.00 0.0
Any ideas why this might be occurring?
Upvotes: 2
Views: 930
Reputation: 939
What this, transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0] = 0
, is actually doing is applying the condition to the entire dataframe and when you put = 0
the 0 gets broadcasted to all the points of data. You're telling it to select all the rows in your dataframe where customer_price is less than 0 then change all the filtered rows to 0.
Aside from applying the condition you have to select the column/series that you want to change.
How I remember to use .loc
is df.loc[row filter/selection, column filter/selection]
Another way to do it would be
transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0,'customer_price'] = 0
There is a good section in the docs about setting values called Setting Values https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html
Upvotes: 1
Reputation: 6091
you could use numpy.max
to handle it (pandas
also has a max
but it's less-intuitive since it defaults to aggregating data instead of working per-row)
import numpy as np
transaction_df_clean['customer_price'] = np.max(0, transaction_df_clean['sales_value'] + transaction_df_clean['coupon_disc'])
this way there won't be any negative numbers
Upvotes: 0
Reputation: 30920
You also need select the column.
m = transaction_df_clean['customer_price'] < 0
transaction_df_clean.loc[m,'customer_price']=0
I would use Series.clip
.
transaction_df_clean['customer_price'] = transaction_df_clean['customer_price'].clip(lower=0)
We could also use Series.mask
.
transaction_df_clean['customer_price']=transaction_df_clean['customer_price'].mask(m,0)
So you need Series.add
+ Series.clip
:
transaction_df_clean['customer_price'] = transaction_df_clean['sales_value'].add(transaction_df_clean['coupon_disc']).clip(lower=0)
Upvotes: 2