JA-pythonista
JA-pythonista

Reputation: 1323

Using lambda functions with apply for Pandas DataFrame

I am sorry for asking such a trivial question, but I keep making mistakes when using the apply function with a lambda function that has input parameters.

See below:

df = pd.DataFrame([["John",1,3],["James",2,3],
            ["Femi",3,4], ["Rita",3,3],
            ["Rita",3,3]], columns=["Name","Age","Height"])


%timeit df["product_AH"] = df[["Age", "Height"]].apply(lambda x,y: x['Age']*y['Height'], axis=1)

Expected output:

    Name    Age  Height  product_AH
0   John    1     3          3
1   James   2     3          6
2   Femi    3     4          12
3   Rita    3     3          9
4   Rita    3     3          9

Upvotes: 0

Views: 3878

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

If you have to use the "apply" variant, the code should be:

df['product_AH'] = df.apply(lambda row: row.Age * row.Height, axis=1)

The parameter to the function applied is the whole row.

But much quicker solution is:

df['product_AH'] = df.Age * df.Height

(1.43 ms, compared to 5.08 ms for the "apply" variant).

This way computation is performed using vectorization, whereas apply refers to each row separately, applies the function to it, then assembles all results and saves them in the target column, which is considerably slower.

Upvotes: 1

Related Questions