Reputation: 667
I need help speeding up this loop and I am not sure how to go about it
import numpy as np
import pandas as pd
import timeit
n = 1000
df = pd.DataFrame({0:np.random.rand(n),1:np.random.rand(n)})
def loop():
result = pd.DataFrame(index=df.index,columns=['result'])
for i in df.index:
last_index_to_consider = df.index.values[::-1][i]
tdf = df.loc[:last_index_to_consider] - df.shift(-i).loc[:last_index_to_consider]
tdf = tdf.apply(lambda x: x**2)
tsumdf = tdf.sum(axis=1)
result.loc[i,'result'] = tsumdf.mean()
return result
print(timeit.timeit(loop, number=10))
Is it possible to tweak the for loop to make it faster or are there options using numba or can I go ahead and use multiple threads to speed this loop up?
What would be the most sensible way to get more performance than just simply evaluating this code straight away?
Upvotes: 3
Views: 166
Reputation: 18628
Just for fun, the ultimate speedup with numba :
import numba
@numba.njit
def numba(d0,d1):
n=len(d0)
result=np.empty(n,np.float64)
for i in range(n):
s=0
k=i
for j in range(n-i):
u = d0[j]-d0[k]
v = d1[j]-d1[k]
k+=1
s += u*u + v*v
result[i] = s/(j+1)
return result
def loop2(df):
return pd.DataFrame({'result':numba(*df.values.T)})
For a 2500x+ factor.
In [519]: %timeit loop2(df)
583 µs ± 5.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Reputation: 221504
There's a lot of compute happening per iteration. Keeping it that way, we could leverage underlying array data alongwith np.einsum
for the squared-sum-reductions
could bring about speedups. Here's an implementation that goes along those lines -
def array_einsum_loop(df):
a = df.values
l = len(a)
out = np.empty(l)
for i in range(l):
d = a[:l-i] - a[i:]
out[i] = np.einsum('ij,ij->',d,d)
df_out = pd.DataFrame({'result':out/np.arange(l,0,-1)})
return df_out
Runtime test -
In [153]: n = 1000
...: df = pd.DataFrame({0:np.random.rand(n),1:np.random.rand(n)})
In [154]: %timeit loop(df)
1 loop, best of 3: 1.43 s per loop
In [155]: %timeit array_einsum_loop(df)
100 loops, best of 3: 5.61 ms per loop
In [156]: 1430/5.61
Out[156]: 254.9019607843137
Not bad for a 250x+
speedup without breaking any loop or bank!
Upvotes: 3