CJL89
CJL89

Reputation: 115

Changing Negative Values to 0, Without Changing other Columns

I have a DF in which I'm analyzing customer total price. I'm finding the total price a customer paid by summing discount (since it is already a negative number) from sales value of the product.

transaction_df_clean['customer_price'] = transaction_df_clean['sales_value'] + transaction_df_clean['coupon_disc']

Consequently, some total prices are negative and I want to change these to 0 to avoid negative numbers.

Input:

transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0].head(10)

Output (showing 1 row):

index   household_key   basket_id   day product_id  quantity    sales_value store_id    retail_disc trans_time  week_no coupon_disc coupon_match_disc   customer_price
----------
13895   988  27282152470     25  1088634    2   1.00    408 -0.98   2353    4   -1.49   0.00    -0.49

However, when trying to change the negative values to 0 in the "customer_price" columns, other columns that are not being targeted change to 0 too.

Input:

transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0] = 0
transaction_df_clean.loc[transaction_df_clean['customer_price'] == 0].head(20)

Output:

index household_key basket_id   day product_id  quantity    sales_value store_id    retail_disc trans_time  week_no coupon_disc coupon_match_disc   customer_price
----------
13895   0   0   0   0   0   0.00    0   0.00    0   0   0.00    0.00    0.0

Any ideas why this might be occurring?

Upvotes: 2

Views: 930

Answers (3)

fcsr
fcsr

Reputation: 939

What this, transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0] = 0, is actually doing is applying the condition to the entire dataframe and when you put = 0 the 0 gets broadcasted to all the points of data. You're telling it to select all the rows in your dataframe where customer_price is less than 0 then change all the filtered rows to 0.

Aside from applying the condition you have to select the column/series that you want to change.

How I remember to use .loc is df.loc[row filter/selection, column filter/selection]

Another way to do it would be

transaction_df_clean.loc[transaction_df_clean['customer_price'] < 0,'customer_price'] = 0

There is a good section in the docs about setting values called Setting Values https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

Upvotes: 1

Yuca
Yuca

Reputation: 6091

you could use numpy.max to handle it (pandas also has a max but it's less-intuitive since it defaults to aggregating data instead of working per-row)

import numpy as np 

transaction_df_clean['customer_price'] = np.max(0, transaction_df_clean['sales_value'] + transaction_df_clean['coupon_disc'])

this way there won't be any negative numbers

Upvotes: 0

ansev
ansev

Reputation: 30920

You also need select the column.

m = transaction_df_clean['customer_price'] < 0
transaction_df_clean.loc[m,'customer_price']=0

I would use Series.clip.

transaction_df_clean['customer_price'] = transaction_df_clean['customer_price'].clip(lower=0)

We could also use Series.mask.

transaction_df_clean['customer_price']=transaction_df_clean['customer_price'].mask(m,0)

So you need Series.add + Series.clip:

transaction_df_clean['customer_price'] = transaction_df_clean['sales_value'].add(transaction_df_clean['coupon_disc']).clip(lower=0)

Upvotes: 2

Related Questions