Reputation: 123
For any given date, I am trying to find the previous close
value that was 1.2x than the present close
value. I made a loop that will check that for every row. However, it is not efficient. The runtime is 45 seconds. How do I make my code more efficient to work with a much larger dataset than this?
Dataset- TSLA or TSLA Daily 5Y Stock Yahoo
df = pd.read_csv(os.getcwd()+"\\TSLA.csv")
# Slicing the dataset
df2 = df[['Date', 'Close']]
irange = np.arange(1, len(df))
for i in irange:
# Dicing first i rows
df3 = df2.head(i)
# Set the target close value that is 1.2x the current close value
targetValue = 1.2 * df3['Close'].tail(1).values[0]
# Check the last 200 days
df4 = df3.tail(200)
df4.set_index('Date', inplace=True)
# Save all the target values in a list
req = df4[df4['Close'] > targetValue]
try:
lent = (req.index.tolist()[-1])
except:
lent = str(9999999)
# Save the last value to the main dataframe
df.at[i,'last_time'] = lent
df.tail(20)
Upvotes: 0
Views: 102
Reputation: 1598
you are doing O(N^3) and some unnecessary data copies. Try this O(NlogN) way
df = pd.read_csv("D:\\TSLA.csv")
stack,cnt=[],0
def OnePointTwoTimesLarger(row):
#cnt is not really needed by what you aksed. But it is usually a better to return the data row you need, instead of just returning the value
global stack,cnt
c=row['Close']
while stack and stack[-1][1]<=c:
stack.pop()
stack.append([row['Date'],c])
cnt+=1
left,right=0,len(stack)-1
while left<right-3:
mid=(left+right)//2
if stack[mid][1]>1.2*c:
left=mid
else:
right=mid
for e in stack[left:right+1][::-1]:
if e[1]>1.2*c:
return e[0]
return 999999
df['last_time']=df.apply(OnePointTwoTimesLarger, axis=1)
df.tail(60)
Upvotes: 1