Reputation: 2508
I am implementing my own function for calculating taxes.My intenention is to solve this problem only with one function. Below you can see the data
df = pd.DataFrame({"id_n":["1","2","3","4","5"],
"sales1":[0,115000,440000,500000,740000],
"sales2":[0,115000,460000,520000,760000],
"tax":[0,8050,57500,69500,69500]
})
Now I want to introduce a tax function that needs to give the same results as results in column tax. Below you can see an estimation of that function:
# Thresholds
min_threeshold = 500000
max_threeshold = 1020000
# Maximum taxes
max_cap = 69500
# Rates
rate_1 = 0.035
rate_2 = 0.1
# Total sales
total_sale = df['sales1'] + df['sales2']
tax = df['tax']
# Function for estimation
def tax_fun(total_sale,tax,min_threeshold,max_threeshold,max_cap,rate_1,rate_2):
if (total_sale > 0 and tax == 0): # <---- This line of code
calc_tax = 0
elif (total_sale < min_threeshold):
calc_tax = total_sale * rate_1
elif (total_sale >= min_threeshold) & (total_sale <= max_threeshold):
calc_tax = total_sale * rate_2
elif (total_sale > max_threeshold):
calc_tax = max_cap
return calc_tax
The next step is the execution of the above function, I want to have all of this results in one column.
df['new_tax']=tax_fun(total_sale,tax,min_threeshold,max_threeshold,max_cap,rate_1,rate_2)
After execution of this command, I received this error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So probably error is happen in this line of row and for that reason function can not be execuded (total_sale > 0 and tax == 0):
So can anybody help me how to solve this problem ?
Upvotes: 0
Views: 53
Reputation: 1786
The error occurs because you are comparing a series (collection of values) with a single integer.
Your variable total_sale
has the following form:
0 0
1 230000
2 900000
3 1020000
4 1500000
dtype: int64
You cannot compare this series with zero. You must either compare each single element with zero (0, 230000, 900000, etc.) or whether any entry satisfies your condition.
I think you want something like this:
def tax_fun(total_sale, tax, min_threeshold, max_threeshold, max_cap, rate_1, rate_2):
calc_tax = np.empty(shape=total_sale.shape)
calc_tax[(total_sale > 0) & (tax == 0)] = 0
calc_tax[(total_sale < min_threeshold)] = total_sale[(total_sale < min_threeshold)] * rate_1
calc_tax[(total_sale >= min_threeshold) & (total_sale <= max_threeshold)] = total_sale[(total_sale >= min_threeshold) & (total_sale <= max_threeshold)] * rate_2
calc_tax[(total_sale > max_threeshold)] = max_cap
return calc_tax
df['new_tax'] = tax_fun(total_sale,tax,min_threeshold,max_threeshold,max_cap,rate_1,rate_2)
print(df)
----------------------------------------------------
id_n sales1 sales2 tax new_tax
0 1 0 0 0 0.0
1 2 115000 115000 8050 8050.0
2 3 440000 460000 57500 90000.0
3 4 500000 520000 69500 102000.0
4 5 740000 760000 69500 69500.0
----------------------------------------------------
I would use indexing instead of if
and else
conditions.
Upvotes: 1