user3053769
user3053769

Reputation: 31

Peak to Trough in time series data

Looking to find every instance in which a time series of a stock index declines by 10% or greater. I am struggling to program it so that it understands order matters (don't want appreciation by 10%).

Ideally the code would: Pick value, check if value after it is 10% below, if not, check next, keep checking next until one is found then record that. Then, move to that "trough' or 'valley' as the beginning to continue the process of checking if values after it are 10% or more less than that value.

I have an Excel file with dates in the first column and the index value in the second

This is what it outputs which I don't think can be correct based on a graph

# Import Libraries
import pandas as pd
import numpy as np
import peakutils
from peakutils.plot import plot as pplot
from matplotlib import pyplot
import matplotlib.pyplot as plt
from scipy import signal
import csv
import scipy
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF

# from pandas import DataFrame

# Import Excel as array
index = pd.read_csv(r"\Users\Reed_2\Desktop\Indexonly.csv")
print("as Pandas")
print (index.values)
# convert to 2 NumPy arrays
dates = index['Date'].as_matrix()
values = index['Index'].as_matrix()
print("values as NumPy")
print(values)
print("Date values")
print(dates)

# Find peaks
peaks = peakutils.indexes(values, thres=0.1, min_dist=1)

print ("peaks")
print(peaks)

a = np.asarray(peaks)
np.savetxt(r"C:\Users\Reed_2\Desktop\export.csv", a, delimiter=",")

Have access to Python, R Studio, and MatLab. Prefer Python as I know it best.
Very grateful for any help on this.

Upvotes: 3

Views: 5359

Answers (1)

piRSquared
piRSquared

Reputation: 294488

Consider the series of returns s

np.random.seed([3,1415])
s = pd.Series(
    np.random.lognormal(.005, .5, size=100),
    pd.date_range('2015-01-01', periods=100, freq='B')
).cumprod()

s.plot()

enter image description here

Use a generator to slice the index

def gen_slice(s, thresh):
    sidx = s.index
    idx = s.index[0]
    v = s.get_value(idx)
    yield idx
    for idx in sidx[1:]:
        v0 = s.get_value(idx)
        if (v0 / v) < 1-thresh:
            v = v0
            yield idx


s.loc[list(gen_slice(s, .1))]

2015-01-01    0.346504
2015-01-02    0.184687
2015-01-05    0.069298
2015-01-06    0.022508
2015-01-07    0.018996
2015-01-26    0.014204
2015-02-03    0.012777
2015-05-01    0.008999
2015-05-04    0.006039
2015-05-06    0.004855
dtype: float64

We can see that every percentage change is less than 10%

s.loc[list(gen_slice(s, .1))].pct_change()

2015-01-01         NaN
2015-01-02   -0.467000
2015-01-05   -0.624783
2015-01-06   -0.675194
2015-01-07   -0.156034
2015-01-26   -0.252278
2015-02-03   -0.100442
2015-05-01   -0.295665
2015-05-04   -0.328967
2015-05-06   -0.195990
dtype: float64

We can plot where those events happened.

idx = list(gen_slice(s, .1))

ax = s.plot()
ax.vlines(idx, s.min(), s.max())

enter image description here


See also below:

It may become necessary to traverse the elements of a series or the rows of a dataframe in a way that the next element or next row is dependent on the previously selected element or row. This is called path dependency.

Consider the following time series s with irregular frequency.

#starting python community conventions
import numpy    as np
import pandas   as pd

# n is number of observations
n = 5000

day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()

s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot();

enter image description here

Let's assume a path dependent condition. Starting with the first member of the series, I want to grab each subsequent element such that the absolute difference between that element and the current element is greater than or equal to x.

We'll solve this problem using python generators.

Generator function

def mover(s, move_size=10):
    """Given a reference, find next value with
    an absolute difference >= move_size"""
    ref = None
    for i, v in s.iteritems():
        if ref is None or (abs(ref - v) >= move_size):
            yield i, v
            ref = v

Then we can define a new series moves like so

moves = pd.Series({i:v for i, v in mover(s, move_size=10)},
                  name='_{}_'.format(s.name))

Plotting them both

moves.plot(legend=True)
s.plot(legend=True)

enter image description here


The analog for dataframes would be:

def mover_df(df, col, move_size=2):
    ref = None
    for i, row in df.iterrows():
        if ref is None or (abs(ref - row.loc[col]) >= move_size):
            yield row
            ref = row.loc[col]

df = s.to_frame()
moves_df = pd.concat(mover_df(df, 'A', 10), axis=1).T

moves_df.A.plot(label='_A_', legend=True)
df.A.plot(legend=True)

enter image description here

Upvotes: 6

Related Questions