Reputation: 57
I'm new to python... So, i wrote this function that should normalize the price values contained in the "price" column of my dataframe:
def normalize_price(df):
for elements in df['price']:
if (df["price"]>= 1000) and (df['price']<= 1499):
df['price'] = 1000
return
elif 1500 <= df['price'] <= 2499:
df['price'] = 1500
return
elif 2500 <= df['price'] <= 2999:
df['price'] = 2500
return
elif 3000 <= df['price'] <= 3999:
df['price'] = 3000
return
So, when I call it I get the error
---------------------------------------------------------------------------
<ipython-input-86-1e239d3cbba4> in normalize_price(df)
20 def normalize_price(df):
21 for elements in df['price']:
---> 22 if (df["price"]>= 1000) and (df['price']<= 1499):
23 df['price'] = 1000
24 return
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
And since I'm going crazy, I'd like to know why :) Thanks!
Upvotes: 0
Views: 318
Reputation: 14063
np.select
is probably the easiest approach
def normalize_price(df):
# create a list of conditions
cond = [
(df["price"]>= 1000) & (df['price']<= 1499),
1500 <= df['price'] <= 2499,
2500 <= df['price'] <= 2999,
3000 <= df['price'] <= 3999
]
# create a list of choices based on the conditions above
choice = [
1000,
1500,
2500,
3000
]
# use numpy.select and assign array to df['price']
df['price'] = np.select(cond, choice, df['price'])
return df
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,10000, 50), columns=['price'])
def normalize_price(df):
cond = [
(df["price"]>= 1000) & (df['price']<= 1499),
(df['price'] >= 1500) & (df['price'] <= 2499),
(df['price'] >= 2500) & (df['price'] <= 2999),
(df['price'] >= 3000) & (df['price'] <= 3999)
]
choice = [
1000,
1500,
2500,
3000
]
df['price_new'] = np.select(cond, choice, df['price'])
return df
normalize_price(df)
price price_new
0 235 235
1 5192 5192
2 905 905
3 7813 7813
4 2895 2500 <-----
5 5056 5056
6 144 144
7 4225 4225
8 7751 7751
9 3462 3000 <----
Upvotes: 2
Reputation: 13437
Here you should really avoid for loops and if statements. You just want to round to the nearest 500 mark so you could do
import pandas as pd
import numpy as np
df = pd.DataFrame({"price":[1200, 1600, 2100, 3499]})
df["price"] = (df["price"]/500).apply(np.floor)*500
EDIT if you are looking for a more general solution
df = pd.DataFrame({"price":[1200, 1600, 2100, 3499,3600, 140000, 160000]})
df["div"] = 5*10**(df["price"].astype(str).str.len()-2)
(df["price"]/df["div"]).apply(np.floor)*df["div"]
Upvotes: 2
Reputation: 13387
You can use pandas.cut
for that purpose, in your case:
bins=[1000, 1500, 2500, 3000, 4000]
df["bin"]=pd.cut(df["price"], bins, right=False, retbins=False, labels=bins[:-1])
Assuming bin
column is the output column for your function
Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html
Upvotes: 0