Reputation: 475
I have an example dataframe with columns 'one' and 'two' consisting of some random ints. I was trying to understand some code with a lambda function in more depth and was puzzled that the code seems to magically work without providing an argument to be passed to the lambda function.
Initially I am creating a new column 'newcol' with pandas assign() method and pass df into an explicit lambda function func(df). The function returns the logs of the df's 'one' column:
df=df.assign(newcol=func(df))
So far so good.
However, what puzzles me is that the code works as well without passing df.
df=df.assign(newcol2=func)
Even if I don't pass (df) into the lambda function, it correctly performs the operation. How does the interpreter know that df is being passed into the lambda function?
Example code below and output:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda x: np.log(x.one)
df=df.assign(newcol=func(df))
print(df)
#This one works too, but why?
df=df.assign(newcol2=func)
print(df)
Output:
one two newcol newcol2
0 1 8 0.000000 0.000000
1 6 7 1.791759 1.791759
2 2 6 0.693147 0.693147
3 2 8 0.693147 0.693147
4 4 2 1.386294 1.386294
5 9 3 2.197225 2.197225
6 2 2 0.693147 0.693147
7 4 7 1.386294 1.386294
(Note I could have used the lambda func inline of assign but have it here explicit for the sake of clarity.)
Upvotes: 2
Views: 237
Reputation: 2569
It's not compilation, it's simply how assign source code is written. As mentioned in pandas assign documentation.
Where the value is a callable, evaluated on df:
Upvotes: 0
Reputation: 26896
If you use pd.DataFrame.assign()
and pass on a callable
, it assumes that the first argument is actually the dataframe itself.
For example, if you change your code to the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda c, x: np.log(x.one + c)
df=df.assign(newcol=func(1, df))
print(df)
#This one will no longer work!
df=df.assign(newcol2=func)
print(df)
the last call to assign()
will not work.
This is explained in the official documentation.
The line df.assign(newcol=func(1, df))
uses the non-callable pathway, while the line df.assign(newcol=func)
uses the callable pathway.
Upvotes: 1