Jonathan Herrera
Jonathan Herrera

Reputation: 6144

Compare Performance of df.apply and Column Operations in python pandas

I would like to know whether performing basic arithmetic operations with columns of a dataframe is faster being done columnwise or via apply. Ad hoc, I would assume that columnwise is faster. But both ways are being considered 'vectorized' operations. So, is df.apply comparable fast?

Upvotes: 2

Views: 1004

Answers (1)

Jonathan Herrera
Jonathan Herrera

Reputation: 6144

We can just try this out. The example below is demonstrating, that the columnwise operation is (much) faster:

import numpy as np
import pandas as pd
from datetime import datetime


def applywise_duration(df):
    start_time = datetime.now()
    df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
    end_time = datetime.now()
    duration = end_time - start_time
    return(duration)

def columnwise_duration(df):
    start_time = datetime.now()
    df['C'] = df['A'] + df['B']
    end_time = datetime.now()
    duration = end_time - start_time
    return(duration)

df_apply = pd.DataFrame(
        np.random.randint(0,10000,size=(1000000, 2)),
        columns=list('AB')
)
df_vector = df_apply.copy()

applywise_duration = applywise_duration(df_apply)
columnwise_duration = columnwise_duration(df_vector)

print('Duration of apply: ', applywise_duration)
print('Duration of columnwise addition: ', columnwise_duration)
print('Ratio: ', columnwise_duration / applywise_duration)
print('That means, in this case, columnwise addition is %s times faster '
        'than addition via apply!'
        % str(applywise_duration / columnwise_duration)
      )

Thsis gives the following on my machine:

Duration of apply:  0:00:23.631236
Duration of columnwise addition:  0:00:00.004234
Ratio:  0.00017916963801639492
That means, columnwise addition is 5581.302786962683 times faster than addition via apply!

Upvotes: 3

Related Questions