Why does python lambda see a series instead of a value?

Question

I create a Pandas DataFrame:

df = pd.DataFrame( {'some_number' : [1,2,3,4,5,6]})

Then I want to add a column called is_even:

df.assign(
    is_even = lambda x : 'YES' if x.some_number % 2 == 0 else 'NO'
)

I get an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand the error is telling me that x.some_number after the if statement is a series. Which is confusing to me, because if I do this:

df.assign(
    is_even = lambda x : 'YES' if 1==2 else x.some_number
)

It works and generates this output:

which indicates that x.some_number is in-fact not a series, but a scalar value.

I know there are other ways to accomplish what I'm trying to accomplish. But I'm interested in the behavior.

Why, when x.some_number is after the if clause is it seen as a Series, but when it's used in the else clause it's seen as a value?

INSTALLED VERSIONS
------------------
python           : 3.8.0.final.0
python-bits      : 32
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder        : little
LOCALE           : English_United States.1252

pandas           : 0.25.3
numpy            : 1.17.4
IPython          : 7.10.0
matplotlib       : 3.1.2

mcsoini · Accepted Answer

The problem is only the if-statement, where you are comparing a series with a scalar value in your first example. This will never work. The second example works because you have a single scalar if-statement (which is of course ok) and you return a Series. Returning a Series (or a scalar) is exactly what the function passed to assign needs to do.

Now, what you actually want to do, is a row-wise comparison. Use apply for that

df['is_even'] = df.some_number.apply(lambda x: 'YES' if x % 2 == 0 else 'NO' )

Here, x is a scalar and the if-statement works as expected. Alternatively, you could combine assign and a lambda function

df.assign(
    is_even = lambda x : x.some_number.apply(lambda x: 'YES' if x % 2 == 0 else 'NO')
)

Notice the difference again to your first example: The outer lambda makes sure that the inner lambda only has to the deal with scalars in if x % 2 == 0. The outer lambda returns a Series, just like in your second example.

Why does python lambda see a series instead of a value?

Answers (2)

Related Questions