Reputation: 29
I am working on the time-series data. To get features from data I have to calculate moving mean, median, mode, slop, kurtosis, skewness etc. I am familiar with scipy.stat
which provides an easy way to calculate these quantities for straight calculation. But for the moving/running part, I have explored the whole internet and got nothing.
Surprisingly moving mean, median and mode are very easy to calculate with numpy
. Unfortunately, there is no built-in function for calculating kurtosis and skewness.
If someone can help, how to calculate moving kurtosis and skewness with scipy? Many thanks
Upvotes: 0
Views: 2522
Reputation: 839
Based on the answer of @KhurramKhalil I would suggest a solution using list comprehension to avoid the for
loop:
import numpy as np
from scipy.stats import kurtosis, skew
x = [1, 2, 4, 8, 9, 81, 25, 35, 1, 6]
def calculate_rolling_skewness(array_series: np.ndarray, window: int) -> np.ndarray:
# According to the series size, set the calculable windows
positions_to_be_calculated = len(array_series) - (window - 1)
# Kurtosis calculation for each window
skewness_series = [skew(array_series[i:(i + window)]) for i in range(positions_to_be_calculated)]
# Concatenate the series to have the same initial array size
skewness_series = np.concatenate((np.full(window-1, np.nan), skewness_series), axis=0)
return skewness_series
calculate_rolling_skewness(array_series=x, window=window)
Results:
array([ nan, nan, 0.38180177, 0.38180177, -0.59517006,
0.70665284, 0.56731658, 0.61897406, -0.45563859, 0.6485518 ])
You can do the same using the kurtosis
function as well.
Upvotes: 0
Reputation: 26916
Pandas offers a DataFrame.rolling()
method which can be used, in combination with its Rolling.apply()
method (i.e. df.rolling().apply()
) to apply an arbitrary function to the specified rolling window.
If you are looking for NumPy-based solution, you could use FlyingCircus Numeric (disclaimer: I am the main author of it).
There, you could find the following:
flyingcircus_numeric.running_apply()
: can apply any function to a 1D array and supports weights, but it is slow;flyingcircus_numeric.moving_apply()
: can apply any function supporting a axis: int
parameter to a 1D array and supports weights, and it is fast (but memory-hungry);flyingcircus_numeric.rolling_apply_nd()
: can apply any function supporting a axis: int|Sequence[int]
parameter to any ND array and it is fast (and memory-efficient), but it does not support weights.Based on your requirements, I would suggest to use rolling_apply_nd()
, e.g.:
import numpy as np
import scipy as sp
import flyingcircus_numeric as fcn
import scipy.stats
NUM = 30
arr = np.arange(NUM)
window = 4
new_arr = fcn.rolling_apply_nd(arr, window, func=sp.stats.kurtosis)
print(new_arr)
# [-1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
# -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
# -1.36 -1.36 -1.36]
Of course, feel free to inspect the source code, it is open source (GPL).
Just to get a feeling of the kind of speed we are talking about, these are the benchmarks for the solutions implemented in FlyingCircus:
The general approach flyingcircus_numeric.running_apply()
is a couple of orders of magnitude slower than either flyingcircus_numeric.rolling_apply_nd()
or flyingcircus_numeric.moving_apply()
, with the first being approx. one order of magnitude faster than the second.
This shows the speed price for generality or support for weighting.
The above plots were obtained using the scripts from here and the following code:
import scipy as sp
import flyingcircus_numeric as fcn
import scipy.stats
WINDOW = 4
FUNC = sp.stats.kurtosis
def my_rolling_apply_nd(arr, window=WINDOW, func=FUNC):
return fcn.rolling_apply_nd(arr, window, func=FUNC)
def my_moving_apply(arr, window=WINDOW, func=FUNC):
return fcn.moving_apply(arr, window, func)
def my_running_apply(arr, window=WINDOW, func=FUNC):
return fcn.running_apply(arr, window, func)
def equal_output(a, b):
return np.all(np.isclose(a, b))
input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
funcs = my_rolling_apply_nd, my_moving_apply, my_running_apply
runtimes, input_sizes, labels, results = benchmark(
funcs, gen_input=np.random.random, equal_output=equal_output,
input_sizes=input_sizes)
plot_benchmarks(runtimes, input_sizes, labels, units='s')
plot_benchmarks(runtimes, input_sizes, labels, units='ms', zoom_fastest=8)
(EDITED to reflect some refactoring of FlyingCircus)
Upvotes: 2
Reputation: 29
After playing around, I have come up with a solution that is purely numpy
and scipy
based. Of course it is using scipy.stats kurtosis
and skew
.
import numpy as np
from scipy.stats import kurtosis, skew
# Window size
N = 4
# Some random data
m = np.array([2, 3, 10, 11, 0, 4, 8, 2, 5, 9])
# Running Kurtosis
def runningKurt(x, N):
# Initilize placeholder array
y = np.zeros((len(x) - (N - 1),))
for i in range(len(x) - (N - 1)):
y[i] = kurtosis(x[i:(i + N)])
return y
# Running Kurtosis
def runningSkew(x, N):
# Initilize placeholder array
y = np.zeros((len(x) - (N - 1),))
for i in range(len(x) - (N - 1)):
y[i] = skew(x[i:(i + N)])
return y
kurt = runningKurt(m, N)
print("kurtosis : ", kurt)
# kurtosis : [-1.93940828 -1.77879935 -1.61464214 -1.40236694 -1.15428571 -1.07626667 -1.42666667]
skw = runningSkew(m, N)
print("skew : ", skw)
# skew : [ 0. -0.1354179 -0.26356495 -0.13814702 0.43465076 0.32331615 -0.36514837]
Upvotes: 1