Gavin
Gavin

Reputation: 1521

compare string got error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I am trying to use if condition to update some values in a column using the following code:

if df['COLOR_DESC'] == 'DARK BLUE':
    df['NEW_COLOR_DESC'] = 'BLUE'

But I got the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So what is wrong with this piece of code?

Upvotes: 1

Views: 2031

Answers (3)

Vince W.
Vince W.

Reputation: 3785

a series is an array. What you are asking is akin to:

array([1, 2, 3]) == 1

Their values are of course not equal, but in numpy (the basis of pandas) the convention is to use boolean operators on arrays elementwise. The correct way would be:

array([1, 2, 3]) == array([1, 2, 4])

gives:

array([True, True, False])

or

all(array([1, 2, 3]) == array([1, 2, 4]))

gives:

False

** note, this is how numpy arrays specifically work, not python iterables in general

Upvotes: 0

Mad Physicist
Mad Physicist

Reputation: 114330

To answer your immediate question, the problem is that the expression df['COLOR_DESC'] == 'DARK BLUE' results in a Series of booleans. The error message is telling you that there is no one unambiguous way to convert that array to a single boolean value as if demands.

The solution is actually not to use if, since you are not applying the if to each element that is DARK_BLUE. Use the boolean values directly as a mask instead:

rows = (df['COLOR_DESC'] == 'DARK BLUE')
df.loc[rows, 'COLOR_DESC'] = 'BLUE'

You have to use loc to update the original df because if you index it as df[rows]['COLOR_DESC'], you will be getting a copy of the required subset. Setting the values in the copy will not propagate back to the original, and you will even get a warning about that.

For example:

>>> df = pd.DataFrame(data={'COLOR_DESC': ['LIGHT_RED', 'DARK_BLUE', 'MEDUIM_GREEN', 'DARK_BLUE']})
>>> df
     COLOR_DESC
0     LIGHT_RED
1     DARK_BLUE
2  MEDUIM_GREEN
3     DARK_BLUE

>>> rows = (df['COLOR_DESC'] == 'DARK BLUE')
>>> rows
0    False
1     True
2    False
3     True
Name: COLOR_DESC, dtype: bool

>>> df.loc[rows, 'COLOR_DESC'] = 'BLUE'
>>> df
     COLOR_DESC
0     LIGHT_RED
1          BLUE
2  MEDUIM_GREEN
3          BLUE

Upvotes: 1

nanojohn
nanojohn

Reputation: 582

Try using the .loc slicing like this instead:

df['NEW_COLOR_DESC'] = df['COLOR_DESC']
df.loc[df['COLOR_DESC'] == 'DARK BLUE', 'NEW_COLOR_DESC'] = 'BLUE'

The reason your solution does not work is because each row of your dataframe will have a different truth value (True/False) based on whether that row contains the value 'DARK BLUE'. The .loc function allows you to select only the rows the fit a certain conditional (df['COLOR_DESC'] == 'DARK BLUE') and adjust the value in the column defined ('NEW_COLOR_DESC') to the new value ('BLUE')

Upvotes: 0

Related Questions