Reputation: 1130
With Pandas using Numpy under the hood I was curious as to why straight numpy code (509 ms) was 12x faster than doing the same operation with a dataframe (6.38 s) in the example below?
# function with numpy arrays
def f_np(freq, asd):
for f in np.arange(21.,2000.,1.):
fi = freq/f
gi = (1+fi**2) / ((1-fi**2)**2 + fi**2) * asd
df['fi'] = fi
df['gi'] = gi
# process each df ...
# function with dataframe
def f_df(df):
for f in np.arange(21.,2000.,1.):
df['fi'] = df.Freq/f
df['gi'] = (1+df.fi**2) / ((1-df.fi**2)**2 + df.fi**2) * df.ASD
# process each df ...
freq = np.arange(20., 2000., .1)
asd = np.ones(len(freq))
df = pd.DataFrame({'Freq':freq, 'ASD':asd})
%timeit f_np(freq, asd)
%timeit f_df(df)
509 ms ± 723 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.38 s ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 0
Views: 510
Reputation: 16683
Are you sure that the difference in speed is because of "some operation with a dataframe" in this specific case? I think the difference in speed is attributed to the fact that you created fi
and gi
variables and assigned the variables on the columns in the first example, but you didn't do that in the second example. The results were similar when I assigned a variable in both.
import pandas as pd,numpy as np
# function with numpy arrays
def f_np(freq, asd):
for f in np.arange(21.,2000.,1.):
fi = freq/f
gi = (1+fi**2) / ((1-fi**2)**2 + fi**2) * asd
df['fi'] = fi
df['gi'] = gi
# process each df ...
# function with dataframe
def f_df(df):
for f in np.arange(21.,2000.,1.):
fi = freq/f
gi = (1+fi**2) / ((1-fi**2)**2 + fi**2) * asd
df['fi'] = fi
df['gi'] = gi
# process each df ...
freq = np.arange(20., 2000., .1)
asd = np.ones(len(freq))
df = pd.DataFrame({'Freq':freq, 'ASD':asd})
%timeit f_np(freq, asd)
%timeit f_df(df)
#562 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#569 ms ± 17.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1