Reputation: 4860
I create a Pandas DataFrame:
df = pd.DataFrame( {'some_number' : [1,2,3,4,5,6]})
Then I want to add a column called is_even:
df.assign(
is_even = lambda x : 'YES' if x.some_number % 2 == 0 else 'NO'
)
I get an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I understand the error is telling me that x.some_number
after the if statement is a series. Which is confusing to me, because if I do this:
df.assign(
is_even = lambda x : 'YES' if 1==2 else x.some_number
)
It works and generates this output:
which indicates that x.some_number is in-fact not a series, but a scalar value.
I know there are other ways to accomplish what I'm trying to accomplish. But I'm interested in the behavior.
Why, when x.some_number is after the if
clause is it seen as a Series, but when it's used in the else
clause it's seen as a value?
INSTALLED VERSIONS
------------------
python : 3.8.0.final.0
python-bits : 32
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LOCALE : English_United States.1252
pandas : 0.25.3
numpy : 1.17.4
IPython : 7.10.0
matplotlib : 3.1.2
Upvotes: 1
Views: 292
Reputation: 6642
The problem is only the if-statement, where you are comparing a series with a scalar value in your first example. This will never work. The second example works because you have a single scalar if-statement (which is of course ok) and you return a Series. Returning a Series (or a scalar) is exactly what the function passed to assign
needs to do.
Now, what you actually want to do, is a row-wise comparison. Use apply
for that
df['is_even'] = df.some_number.apply(lambda x: 'YES' if x % 2 == 0 else 'NO' )
Here, x is a scalar and the if-statement works as expected. Alternatively, you could combine assign and a lambda function
df.assign(
is_even = lambda x : x.some_number.apply(lambda x: 'YES' if x % 2 == 0 else 'NO')
)
Notice the difference again to your first example: The outer lambda makes sure that the inner lambda only has to the deal with scalars in if x % 2 == 0
. The outer lambda returns a Series, just like in your second example.
Upvotes: 1
Reputation: 1845
Your proof doesn't pan out. Pandas Dataframe.assign
seems to be able to handle a series or a scalar and apply it to the dataframe.
In [7]: df.assign(is_even=lambda x: x.some_number[0] )
Out[7]:
some_number is_even
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
5 6 1
If you read the docs carefully, you'll see the parameter accepts a callback or a series and applies it type dependent.
The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
Also, if you dig into the source a bit:
# >= 3.6 preserve order of kwargs
if PY36:
for k, v in kwargs.items():
data[k] = com.apply_if_callable(v, data)
You can see if it's a callable, it passes the entire dataframe in to your callable.
Upvotes: 0