Reputation: 423
I'm trying to calculate the Hurst Exponent of a time series in python, a value that determines some mean reversion characteristics of a time series for quantitative finance. I've taken a time series, of any length, and chosen to split it into chunks of data, a process that is a part of calculating the Hurst Exponent (one of several methods). I'm writing this as a function. Imagine I have the time series (prices of a security) as "y" and the number of chunks I want as "n":
def hurst(y,n):
y = array_split(y,n)
The problem is that now the array is split into chunks, where one of the chunks is not equal in size with the others. I want to find the mean, standard deviation, mean centered series, cumulative sum of the mean centered series, and range of the cumulative sum for each chunk. But since the array is not uniform in size, I haven't found a way to accomplish this. Basically when I try to pass
mean(y,axis=0)
Or 1 or 2, for the axis, I get an error. When using n=20, the shape of the array is given as
(20,)
I thought maybe "vectorize" could help me? But I haven't quite figured how to use it. I'm trying to avoid looping through the data.
Sample Data after it is split:
[array([[ 1.04676],
[ 1.0366 ],
[ 1.0418 ],
[ 1.0536 ],
[ 1.0639 ],
[ 1.06556],
[ 1.0668 ]]), array([[ 1.056 ],
[ 1.053 ],
[ 1.0521 ],
[ 1.0517 ],
[ 1.0551 ],
[ 1.0485 ],
[ 1.05705]]), array([[ 1.0531],
[ 1.0545],
[ 1.0682],
[ 1.08 ],
[ 1.0728],
[ 1.061 ],
[ 1.0554]]), array([[ 1.0642],
[ 1.0607],
[ 1.0546],
[ 1.0521],
[ 1.0548],
[ 1.0647],
[ 1.0604]])
Data type list
Upvotes: 3
Views: 317
Reputation: 423
For anybody that stumbles across this, I've solved the problem and resolved to using a Pandas Dataframe instead...
def hurst(y,n):
y = prices.as_matrix()
y = array_split(y,n)
y = pd.DataFrame.from_records(y).transpose()
y = y.dropna()
# Mean Centered Series
m = y.mean(axis='columns')
Y = y.sub(m,axis = 'rows')
# Standard Deviation of Series
S = y.std(axis='columns')
# Cumulative Sum Series
Z = Y.cumsum()
# Range Series
R = Z.max(axis='columns')-Z.min(axis='columns')
# Rescale Range
RS = R/S
RS = RS.sort_values()
# Time Period
s = shape(y)
t = linspace(1,s[0],s[0])
# Log Scales
logt = log10(t)
logRS = log10(RS)
print len(t),len(logRS)
# Regression Fit
slope, intercept, r_value, p_value, std_err = stats.mstats.linregress(logt, logRS)
# Hurst Exponent
H = slope/2
return H, logt, logRS
Upvotes: 0
Reputation: 8464
To make a list of averages you can simply use list comprehension:
[mean(x[axis]) for axis in range(len(x))]
it goes over the axes and compute the mean of each part.
Upvotes: 1