mandiatutti
mandiatutti

Reputation: 57

The truth value of a Series is ambiguous. Can't figure it out

I'm new to python... So, i wrote this function that should normalize the price values contained in the "price" column of my dataframe:

def normalize_price(df): 
    for elements in df['price']: 
        if (df["price"]>= 1000) and (df['price']<= 1499): 
            df['price'] = 1000 
            return
        elif 1500 <= df['price'] <= 2499:
            df['price'] = 1500 
            return
        elif 2500 <= df['price'] <= 2999:
            df['price'] = 2500 
            return
        elif 3000 <= df['price'] <= 3999:
            df['price'] = 3000 
            return

So, when I call it I get the error

---------------------------------------------------------------------------
<ipython-input-86-1e239d3cbba4> in normalize_price(df)
     20 def normalize_price(df):
     21     for elements in df['price']:
---> 22         if (df["price"]>= 1000) and (df['price']<= 1499):
     23             df['price'] = 1000
     24             return

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

And since I'm going crazy, I'd like to know why :) Thanks!

Upvotes: 0

Views: 318

Answers (3)

It_is_Chris
It_is_Chris

Reputation: 14063

np.select is probably the easiest approach

def normalize_price(df): 
    # create a list of conditions
    cond = [
        (df["price"]>= 1000) & (df['price']<= 1499),
        1500 <= df['price'] <= 2499,
        2500 <= df['price'] <= 2999,
        3000 <= df['price'] <= 3999
    ]
    # create a list of choices based on the conditions above
    choice = [
        1000,
        1500,
        2500,
        3000
    ]
    # use numpy.select and assign array to df['price']
    df['price'] = np.select(cond, choice, df['price'])
    return df

update with example

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,10000, 50), columns=['price'])

def normalize_price(df): 
    cond = [
        (df["price"]>= 1000) & (df['price']<= 1499),
        (df['price'] >= 1500) & (df['price'] <= 2499),
        (df['price'] >= 2500) & (df['price'] <= 2999),
        (df['price'] >= 3000) & (df['price'] <= 3999)
    ]

    choice = [
        1000,
        1500,
        2500,
        3000
    ]

    df['price_new'] = np.select(cond, choice, df['price'])
    return df

normalize_price(df)

    price  price_new
0     235        235
1    5192       5192
2     905        905
3    7813       7813
4    2895       2500 <-----
5    5056       5056
6     144        144
7    4225       4225
8    7751       7751
9    3462       3000 <----

Upvotes: 2

rpanai
rpanai

Reputation: 13437

Here you should really avoid for loops and if statements. You just want to round to the nearest 500 mark so you could do

import pandas as pd
import numpy as np

df = pd.DataFrame({"price":[1200, 1600, 2100, 3499]})

df["price"] = (df["price"]/500).apply(np.floor)*500

EDIT if you are looking for a more general solution


df = pd.DataFrame({"price":[1200, 1600, 2100, 3499,3600, 140000, 160000]})

df["div"] = 5*10**(df["price"].astype(str).str.len()-2)
(df["price"]/df["div"]).apply(np.floor)*df["div"]

Upvotes: 2

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

You can use pandas.cut for that purpose, in your case:

bins=[1000, 1500, 2500, 3000, 4000]

df["bin"]=pd.cut(df["price"], bins, right=False, retbins=False, labels=bins[:-1])

Assuming bin column is the output column for your function

Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

Upvotes: 0

Related Questions