silent_hunter
silent_hunter

Reputation: 2508

Operators and multiple conditions in pandas

I am implementing my own function for calculating taxes.My intenention is to solve this problem only with one function. Below you can see the data

df = pd.DataFrame({"id_n":["1","2","3","4","5"],
                               "sales1":[0,115000,440000,500000,740000],
                               "sales2":[0,115000,460000,520000,760000],
                               "tax":[0,8050,57500,69500,69500]
                  })

Now I want to introduce a tax function that needs to give the same results as results in column tax. Below you can see an estimation of that function:

# Thresholds
min_threeshold = 500000
max_threeshold = 1020000
                
# Maximum taxes
max_cap = 69500
                
# Rates
rate_1 = 0.035
rate_2 = 0.1 
                
# Total sales
total_sale = df['sales1'] + df['sales2']
tax = df['tax']
        
# Function for estimation
def tax_fun(total_sale,tax,min_threeshold,max_threeshold,max_cap,rate_1,rate_2):
     if (total_sale > 0 and tax == 0):     # <---- This line of code
               calc_tax = 0
     elif (total_sale < min_threeshold): 
               calc_tax = total_sale * rate_1  
     elif (total_sale >= min_threeshold) &  (total_sale <= max_threeshold): 
               calc_tax =  total_sale * rate_2 
     elif (total_sale > max_threeshold):
              calc_tax = max_cap  
     return calc_tax

The next step is the execution of the above function, I want to have all of this results in one column.

df['new_tax']=tax_fun(total_sale,tax,min_threeshold,max_threeshold,max_cap,rate_1,rate_2)

After execution of this command, I received this error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So probably error is happen in this line of row and for that reason function can not be execuded (total_sale > 0 and tax == 0):

So can anybody help me how to solve this problem ?

Upvotes: 0

Views: 53

Answers (1)

ko3
ko3

Reputation: 1786

The error occurs because you are comparing a series (collection of values) with a single integer.

Your variable total_sale has the following form:

0          0
1     230000
2     900000
3    1020000
4    1500000
dtype: int64

You cannot compare this series with zero. You must either compare each single element with zero (0, 230000, 900000, etc.) or whether any entry satisfies your condition.

I think you want something like this:

def tax_fun(total_sale, tax, min_threeshold, max_threeshold, max_cap, rate_1, rate_2):
    calc_tax = np.empty(shape=total_sale.shape)
    calc_tax[(total_sale > 0) & (tax == 0)] = 0
    calc_tax[(total_sale < min_threeshold)] = total_sale[(total_sale < min_threeshold)] * rate_1
    calc_tax[(total_sale >= min_threeshold) & (total_sale <= max_threeshold)] = total_sale[(total_sale >= min_threeshold) & (total_sale <= max_threeshold)] * rate_2
    calc_tax[(total_sale > max_threeshold)] = max_cap
    return calc_tax

df['new_tax'] = tax_fun(total_sale,tax,min_threeshold,max_threeshold,max_cap,rate_1,rate_2)

print(df)

----------------------------------------------------
    id_n    sales1  sales2  tax    new_tax
0   1       0       0       0       0.0
1   2       115000  115000  8050    8050.0
2   3       440000  460000  57500   90000.0
3   4       500000  520000  69500   102000.0
4   5       740000  760000  69500   69500.0
----------------------------------------------------

I would use indexing instead of if and else conditions.

Upvotes: 1

Related Questions