Reputation: 389
Hello I have an huge list of values, I want to to find all n values pattern like list[0:30], list[1:31]. And to each value compare percentage to the first, like percentage_change(array[0],array[1]), percentage_change(array[0],array[2]), all the way till the end of pattern. After this, I want to store all the 30 values patterns in an array of patterns to compare to other values in the future.
To do so I have to build a function: To this function, 30 values can be changed to any of my choices by change variable numberOfEntries For each pattern, I do the mean of the 10 next outcomes and store it in an array of outcomes with the same index
#end point is the end of array
#inputs (array, numberOfEntries)
#outPut(list of Patterns, list of outcomes)
y=0
condition= numberOfEntries+1
#each pattern list
pattern=[]
#list of patterns
Patterns=[]
#outcomes array
outcomes=[]
while (y<len(array)):
i=1
while(i<condition):
#this is percentage change function, I have built it inside to gain speed. Try is used because possibility of 0 division
try:
x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(array[y-numberOfEntries]))*100.00
if x == 0.0:
x=0.000000001
except:
x= 0.00000001
i+=1
pattern.append(x)
#here is the outcomes
outcomeRange = array[y+5:y+15]
outcome.append(outcomeRange)
Patterns.append(pattern)
#clean pattern array
pattern=[]
y+=1
Doing this to an 8559 values array, which is small for the quantity of data I have took me 229.6792.
There is a way of adapt this to multithreading or an way of improve this speed?
EDIT:
To explain better, I have this ohlc data:
open high low close volume
TimeStamp
2016-08-20 15:50:00 0.003008 0.003008 0.002995 0.003000 6.351215
2016-08-20 15:55:00 0.003000 0.003008 0.003000 0.003008 6.692174
2016-08-20 16:00:00 0.003008 0.003009 0.002996 0.003001 10.813029
2016-08-20 16:05:00 0.003001 0.003000 0.002991 0.002991 4.368509
2016-08-20 16:10:00 0.002991 0.002993 0.002989 0.002990 6.662944
2016-08-20 16:15:00 0.002990 0.003015 0.002989 0.003015 8.495640
I extract this as
array=df['close'].values
Then I apply this array to the function and it will return a list full of lists like this for this particular set of values,
[0.26, 0.03, -0.03, -0.04, ,0.005]
This are percent changes from each row to the begin of the sample, and this is what I call a pattern. I can choose how much entries can have a pattern.
Hope I'm more clear now...
Upvotes: 1
Views: 1499
Reputation: 140246
First, I would turn the while
loop to a for
loop, since i
is now incremented faster.
for i in range(1,condition):
Now, since y
doesn't change within your inner loop, you can optimize your computation from:
x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(close[y-numberOfEntries]))*100.00
to:
x = (float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries]) * z
where z
is precomputed before the while/for
loop as:
z = 100.00 / abs(close[y-numberOfEntries])
why?
z
is pre-computed so no computation of abs
and access to close
arrayz
is the inverse of the value to divide, so you can multiply by it. Multiplication is way faster than division.z
outside the loop, and has to be handled accordingly (wrap the whole z + loop thing in try/except
and set result to x= 0.00000001
when it occurs, it should be equivalent)so your inner loop could be:
try:
z = 100.00 / abs(close[y-numberOfEntries])
for i in range(1,condition):
x = (float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries]) * z
except ZeroDivisionError:
x = 0.00000001
pattern.append(x)
Upvotes: 2