Reputation: 31
Looking to find every instance in which a time series of a stock index declines by 10% or greater. I am struggling to program it so that it understands order matters (don't want appreciation by 10%).
Ideally the code would: Pick value, check if value after it is 10% below, if not, check next, keep checking next until one is found then record that. Then, move to that "trough' or 'valley' as the beginning to continue the process of checking if values after it are 10% or more less than that value.
I have an Excel file with dates in the first column and the index value in the second
This is what it outputs which I don't think can be correct based on a graph
# Import Libraries
import pandas as pd
import numpy as np
import peakutils
from peakutils.plot import plot as pplot
from matplotlib import pyplot
import matplotlib.pyplot as plt
from scipy import signal
import csv
import scipy
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF
# from pandas import DataFrame
# Import Excel as array
index = pd.read_csv(r"\Users\Reed_2\Desktop\Indexonly.csv")
print("as Pandas")
print (index.values)
# convert to 2 NumPy arrays
dates = index['Date'].as_matrix()
values = index['Index'].as_matrix()
print("values as NumPy")
print(values)
print("Date values")
print(dates)
# Find peaks
peaks = peakutils.indexes(values, thres=0.1, min_dist=1)
print ("peaks")
print(peaks)
a = np.asarray(peaks)
np.savetxt(r"C:\Users\Reed_2\Desktop\export.csv", a, delimiter=",")
Have access to Python, R Studio, and MatLab. Prefer Python as I know it best.
Very grateful for any help on this.
Upvotes: 3
Views: 5359
Reputation: 294488
Consider the series of returns s
np.random.seed([3,1415])
s = pd.Series(
np.random.lognormal(.005, .5, size=100),
pd.date_range('2015-01-01', periods=100, freq='B')
).cumprod()
s.plot()
Use a generator to slice the index
def gen_slice(s, thresh):
sidx = s.index
idx = s.index[0]
v = s.get_value(idx)
yield idx
for idx in sidx[1:]:
v0 = s.get_value(idx)
if (v0 / v) < 1-thresh:
v = v0
yield idx
s.loc[list(gen_slice(s, .1))]
2015-01-01 0.346504
2015-01-02 0.184687
2015-01-05 0.069298
2015-01-06 0.022508
2015-01-07 0.018996
2015-01-26 0.014204
2015-02-03 0.012777
2015-05-01 0.008999
2015-05-04 0.006039
2015-05-06 0.004855
dtype: float64
We can see that every percentage change is less than 10%
s.loc[list(gen_slice(s, .1))].pct_change()
2015-01-01 NaN
2015-01-02 -0.467000
2015-01-05 -0.624783
2015-01-06 -0.675194
2015-01-07 -0.156034
2015-01-26 -0.252278
2015-02-03 -0.100442
2015-05-01 -0.295665
2015-05-04 -0.328967
2015-05-06 -0.195990
dtype: float64
We can plot where those events happened.
idx = list(gen_slice(s, .1))
ax = s.plot()
ax.vlines(idx, s.min(), s.max())
See also below:
It may become necessary to traverse the elements of a series or the rows of a dataframe in a way that the next element or next row is dependent on the previously selected element or row. This is called path dependency.
Consider the following time series s
with irregular frequency.
#starting python community conventions
import numpy as np
import pandas as pd
# n is number of observations
n = 5000
day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()
s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot();
Let's assume a path dependent condition. Starting with the first member of the series, I want to grab each subsequent element such that the absolute difference between that element and the current element is greater than or equal to x
.
We'll solve this problem using python generators.
Generator function
def mover(s, move_size=10):
"""Given a reference, find next value with
an absolute difference >= move_size"""
ref = None
for i, v in s.iteritems():
if ref is None or (abs(ref - v) >= move_size):
yield i, v
ref = v
Then we can define a new series moves
like so
moves = pd.Series({i:v for i, v in mover(s, move_size=10)},
name='_{}_'.format(s.name))
Plotting them both
moves.plot(legend=True)
s.plot(legend=True)
The analog for dataframes would be:
def mover_df(df, col, move_size=2):
ref = None
for i, row in df.iterrows():
if ref is None or (abs(ref - row.loc[col]) >= move_size):
yield row
ref = row.loc[col]
df = s.to_frame()
moves_df = pd.concat(mover_df(df, 'A', 10), axis=1).T
moves_df.A.plot(label='_A_', legend=True)
df.A.plot(legend=True)
Upvotes: 6