Reputation: 1323
I am sorry for asking such a trivial question, but I keep making mistakes when using the apply function
with a lambda
function that has input parameters.
See below:
df = pd.DataFrame([["John",1,3],["James",2,3],
["Femi",3,4], ["Rita",3,3],
["Rita",3,3]], columns=["Name","Age","Height"])
%timeit df["product_AH"] = df[["Age", "Height"]].apply(lambda x,y: x['Age']*y['Height'], axis=1)
Expected output:
Name Age Height product_AH
0 John 1 3 3
1 James 2 3 6
2 Femi 3 4 12
3 Rita 3 3 9
4 Rita 3 3 9
Upvotes: 0
Views: 3878
Reputation: 30971
If you have to use the "apply" variant, the code should be:
df['product_AH'] = df.apply(lambda row: row.Age * row.Height, axis=1)
The parameter to the function applied is the whole row.
But much quicker solution is:
df['product_AH'] = df.Age * df.Height
(1.43 ms, compared to 5.08 ms for the "apply" variant).
This way computation is performed using vectorization, whereas apply refers to each row separately, applies the function to it, then assembles all results and saves them in the target column, which is considerably slower.
Upvotes: 1