CodeTrek
CodeTrek

Reputation: 475

Lambda function without passing argument

I have an example dataframe with columns 'one' and 'two' consisting of some random ints. I was trying to understand some code with a lambda function in more depth and was puzzled that the code seems to magically work without providing an argument to be passed to the lambda function.

Initially I am creating a new column 'newcol' with pandas assign() method and pass df into an explicit lambda function func(df). The function returns the logs of the df's 'one' column:

df=df.assign(newcol=func(df))

So far so good.

However, what puzzles me is that the code works as well without passing df.

df=df.assign(newcol2=func)

Even if I don't pass (df) into the lambda function, it correctly performs the operation. How does the interpreter know that df is being passed into the lambda function?

Example code below and output:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda x: np.log(x.one)
df=df.assign(newcol=func(df))
print(df)

#This one works too, but why?
df=df.assign(newcol2=func)
print(df)
Output:
   one  two    newcol   newcol2
0    1    8  0.000000  0.000000
1    6    7  1.791759  1.791759
2    2    6  0.693147  0.693147
3    2    8  0.693147  0.693147
4    4    2  1.386294  1.386294
5    9    3  2.197225  2.197225
6    2    2  0.693147  0.693147
7    4    7  1.386294  1.386294

(Note I could have used the lambda func inline of assign but have it here explicit for the sake of clarity.)

Upvotes: 2

Views: 237

Answers (2)

Florian Bernard
Florian Bernard

Reputation: 2569

It's not compilation, it's simply how assign source code is written. As mentioned in pandas assign documentation.

Where the value is a callable, evaluated on df:

Upvotes: 0

norok2
norok2

Reputation: 26896

If you use pd.DataFrame.assign() and pass on a callable, it assumes that the first argument is actually the dataframe itself.

For example, if you change your code to the following:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda c, x: np.log(x.one + c)
df=df.assign(newcol=func(1, df))
print(df)

#This one will no longer work!
df=df.assign(newcol2=func)
print(df)

the last call to assign() will not work.

This is explained in the official documentation. The line df.assign(newcol=func(1, df)) uses the non-callable pathway, while the line df.assign(newcol=func) uses the callable pathway.

Upvotes: 1

Related Questions