Reputation: 35
This is my dataframe of prices
stock1 stock2
0 2.3 10.1
1 1.9 11.2
2 3.5 10.5
3 2.8 10.8
4 3.1 10.3
5 2.7 9.8
6 3.3 10.2
And this is dataframe of price supports I want to obtain
stock1 stock2
0 NaN NaN
1 1.9 10.1
2 1.9 10.1
3 1.9 10.5
4 2.8 10.1
5 1.9 NaN
6 2.7 9.8
Let's focus on the first column of prices. The idea is to calculate down peaks in this way
prices1 = prices['stock1']
mask = (prices1.shift(1) > prices1) & (prices1.shift(-1) > prices1)
supports1 = prices1.where(mask, NaN)
supports1.iloc[0] = min(prices1[0],prices1[1])
supports1 = supports1.shift(1).fillna(method='ffill')
and we obtain
stock1
0 NaN
1 NaN
2 1.9
3 1.9
4 2.8
5 2.8
6 2.7
Another rule is that, for every price, support must be lower. This doesn't happen in row 5 because 2.8 > 2.7. To correct we have to look backward in this support column to find first occurrence lower than current price (if exists otherwise NaN). In this case right value is 1.9
I found 2 ways to solve the problem but I need to iterate and when dataframe increases it becomes incredibly slow. I'd like 10 times faster, hopefully 100 times. This is my code
from pandas import DataFrame
from numpy import NaN
from numpy.random import uniform
from timeit import timeit
##rows = 5000
##cols = 10
##d={}
##for i in range(cols):
## d['stock_{}'.format(i)] = 100*uniform(0.95,1.05,rows).cumprod()
##prices = DataFrame(d)
prices = DataFrame({'stock1':[2.3, 1.9, 3.5, 2.8, 3.1, 2.7, 3.3],\
'stock2':[10.1, 11.2, 10.5, 10.8, 10.3, 9.8, 10.2]})
#----------------------------------------------------------------
def calc_supports1(prices):
supports = DataFrame().reindex_like(prices)
for stock in prices:
prices1 = prices[stock]
mask = (prices1.shift(1) > prices1) & (prices1.shift(-1) > prices1)
supports1 = prices1.where(mask, NaN)
supports1.iloc[0] = min(prices1[0],prices1[1])
supports1 = supports1.shift(1).fillna(method='ffill')
sup = supports1.drop_duplicates()
for i,v in prices1.loc[prices1 < supports1].iteritems():
mask = (sup.index < i) & (sup < v)
sup2 = sup.values[mask.values]
supports1.at[i] = sup2[-1] if len(sup2) > 0 else NaN
supports[stock] = supports1
return supports
#----------------------------------------------------------------
def calc_supports2(prices):
supports = DataFrame().reindex_like(prices)
for stock in prices:
prices1 = prices[stock]
sup = [min(prices1[0],prices1[1])]
supports1 = [NaN, sup[0]]
for i in xrange(2,len(prices1)):
while len(sup) > 0 and prices1[i] < sup[0]:
sup.pop(0)
if prices1[i-1]<prices1[i] and prices1[i-1]<prices1[i-2]:
sup.insert(0, prices1[i-1])
supports1.append(sup[0] if len(sup) > 0 else NaN)
supports[stock] = supports1
return supports
#----------------------------------------------------------------
print 'fun1', timeit('calc_supports1(prices)', \
setup='from __main__ import calc_supports1, prices',number = 1)
print 'fun2', timeit('calc_supports2(prices)', \
setup='from __main__ import calc_supports2, prices',number = 1)
How can I speed up?
Upvotes: 0
Views: 158
Reputation: 50836
There is multiple performance problems in this code.
Here is the corrected code:
def calc_supports3(prices):
supports = DataFrame().reindex_like(prices)
for stock in prices:
prices1 = list(prices[stock])
sup = [min(prices1[0],prices1[1])]
supports1 = [NaN, sup[-1]]
# Sliding window
prices1_im2 = NaN
prices1_im1 = prices1[0]
prices1_im0 = prices1[1]
for i in xrange(2,len(prices1)):
prices1_im2, prices1_im1, prices1_im0 = prices1_im1, prices1_im0, prices1[i]
while len(sup) > 0 and prices1_im0 < sup[-1]:
sup.pop()
if prices1_im1<prices1_im0 and prices1_im1<prices1_im2:
sup.append(prices1_im1)
supports1.append(sup[-1] if len(sup) > 0 else NaN)
supports[stock] = supports1
return supports
Here are performance results on your small dataset on my machine:
fun1 0.006461 s
fun2 0.000901 s
fun3 0.000648 s (40% faster than fun2)
Here are performance results on a randomly-generated dataset with 50 000 lines:
fun1 3.916947 s
fun2 2.064891 s
fun3 0.034465 s (60 times faster than fun2)
Upvotes: 2