Reputation: 1521
I am trying to use if
condition to update some values in a column using the following code:
if df['COLOR_DESC'] == 'DARK BLUE':
df['NEW_COLOR_DESC'] = 'BLUE'
But I got the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So what is wrong with this piece of code?
Upvotes: 1
Views: 2031
Reputation: 3785
a series is an array. What you are asking is akin to:
array([1, 2, 3]) == 1
Their values are of course not equal, but in numpy (the basis of pandas) the convention is to use boolean operators on arrays elementwise. The correct way would be:
array([1, 2, 3]) == array([1, 2, 4])
gives:
array([True, True, False])
or
all(array([1, 2, 3]) == array([1, 2, 4]))
gives:
False
** note, this is how numpy arrays specifically work, not python iterables in general
Upvotes: 0
Reputation: 114330
To answer your immediate question, the problem is that the expression df['COLOR_DESC'] == 'DARK BLUE'
results in a Series of booleans. The error message is telling you that there is no one unambiguous way to convert that array to a single boolean value as if
demands.
The solution is actually not to use if
, since you are not applying the if
to each element that is DARK_BLUE
. Use the boolean values directly as a mask instead:
rows = (df['COLOR_DESC'] == 'DARK BLUE')
df.loc[rows, 'COLOR_DESC'] = 'BLUE'
You have to use loc
to update the original df
because if you index it as df[rows]['COLOR_DESC']
, you will be getting a copy of the required subset. Setting the values in the copy will not propagate back to the original, and you will even get a warning about that.
For example:
>>> df = pd.DataFrame(data={'COLOR_DESC': ['LIGHT_RED', 'DARK_BLUE', 'MEDUIM_GREEN', 'DARK_BLUE']})
>>> df
COLOR_DESC
0 LIGHT_RED
1 DARK_BLUE
2 MEDUIM_GREEN
3 DARK_BLUE
>>> rows = (df['COLOR_DESC'] == 'DARK BLUE')
>>> rows
0 False
1 True
2 False
3 True
Name: COLOR_DESC, dtype: bool
>>> df.loc[rows, 'COLOR_DESC'] = 'BLUE'
>>> df
COLOR_DESC
0 LIGHT_RED
1 BLUE
2 MEDUIM_GREEN
3 BLUE
Upvotes: 1
Reputation: 582
Try using the .loc
slicing like this instead:
df['NEW_COLOR_DESC'] = df['COLOR_DESC']
df.loc[df['COLOR_DESC'] == 'DARK BLUE', 'NEW_COLOR_DESC'] = 'BLUE'
The reason your solution does not work is because each row of your dataframe will have a different truth value (True/False) based on whether that row contains the value 'DARK BLUE'. The .loc
function allows you to select only the rows the fit a certain conditional (df['COLOR_DESC'] == 'DARK BLUE'
) and adjust the value in the column defined ('NEW_COLOR_DESC') to the new value ('BLUE')
Upvotes: 0