Reputation: 6144
I would like to know whether performing basic arithmetic operations with columns of a dataframe is faster being done columnwise or via apply. Ad hoc, I would assume that columnwise is faster. But both ways are being considered 'vectorized' operations. So, is df.apply
comparable fast?
Upvotes: 2
Views: 1004
Reputation: 6144
We can just try this out. The example below is demonstrating, that the columnwise operation is (much) faster:
import numpy as np
import pandas as pd
from datetime import datetime
def applywise_duration(df):
start_time = datetime.now()
df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
end_time = datetime.now()
duration = end_time - start_time
return(duration)
def columnwise_duration(df):
start_time = datetime.now()
df['C'] = df['A'] + df['B']
end_time = datetime.now()
duration = end_time - start_time
return(duration)
df_apply = pd.DataFrame(
np.random.randint(0,10000,size=(1000000, 2)),
columns=list('AB')
)
df_vector = df_apply.copy()
applywise_duration = applywise_duration(df_apply)
columnwise_duration = columnwise_duration(df_vector)
print('Duration of apply: ', applywise_duration)
print('Duration of columnwise addition: ', columnwise_duration)
print('Ratio: ', columnwise_duration / applywise_duration)
print('That means, in this case, columnwise addition is %s times faster '
'than addition via apply!'
% str(applywise_duration / columnwise_duration)
)
Thsis gives the following on my machine:
Duration of apply: 0:00:23.631236
Duration of columnwise addition: 0:00:00.004234
Ratio: 0.00017916963801639492
That means, columnwise addition is 5581.302786962683 times faster than addition via apply!
Upvotes: 3