Trevor
Trevor

Reputation: 4860

Why does python lambda see a series instead of a value?

I create a Pandas DataFrame:

df = pd.DataFrame( {'some_number' : [1,2,3,4,5,6]})

enter image description here

Then I want to add a column called is_even:

df.assign(
    is_even = lambda x : 'YES' if x.some_number % 2 == 0 else 'NO'
)

I get an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand the error is telling me that x.some_number after the if statement is a series. Which is confusing to me, because if I do this:

df.assign(
    is_even = lambda x : 'YES' if 1==2 else x.some_number
)

It works and generates this output:

enter image description here

which indicates that x.some_number is in-fact not a series, but a scalar value.

I know there are other ways to accomplish what I'm trying to accomplish. But I'm interested in the behavior.

Why, when x.some_number is after the if clause is it seen as a Series, but when it's used in the else clause it's seen as a value?

INSTALLED VERSIONS
------------------
python           : 3.8.0.final.0
python-bits      : 32
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder        : little
LOCALE           : English_United States.1252

pandas           : 0.25.3
numpy            : 1.17.4
IPython          : 7.10.0
matplotlib       : 3.1.2

Upvotes: 1

Views: 292

Answers (2)

mcsoini
mcsoini

Reputation: 6642

The problem is only the if-statement, where you are comparing a series with a scalar value in your first example. This will never work. The second example works because you have a single scalar if-statement (which is of course ok) and you return a Series. Returning a Series (or a scalar) is exactly what the function passed to assign needs to do.

Now, what you actually want to do, is a row-wise comparison. Use apply for that

df['is_even'] = df.some_number.apply(lambda x: 'YES' if x % 2 == 0 else 'NO' )

Here, x is a scalar and the if-statement works as expected. Alternatively, you could combine assign and a lambda function

df.assign(
    is_even = lambda x : x.some_number.apply(lambda x: 'YES' if x % 2 == 0 else 'NO')
)

Notice the difference again to your first example: The outer lambda makes sure that the inner lambda only has to the deal with scalars in if x % 2 == 0. The outer lambda returns a Series, just like in your second example.

Upvotes: 1

Kirk
Kirk

Reputation: 1845

Your proof doesn't pan out. Pandas Dataframe.assign seems to be able to handle a series or a scalar and apply it to the dataframe.

In [7]: df.assign(is_even=lambda x: x.some_number[0] )                                                                 
Out[7]: 
   some_number  is_even
0            1        1
1            2        1
2            3        1
3            4        1
4            5        1
5            6        1

If you read the docs carefully, you'll see the parameter accepts a callback or a series and applies it type dependent.

The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

Also, if you dig into the source a bit:

# >= 3.6 preserve order of kwargs
if PY36:
for k, v in kwargs.items():
    data[k] = com.apply_if_callable(v, data)

You can see if it's a callable, it passes the entire dataframe in to your callable.

Upvotes: 0

Related Questions