Khurram Khalil
Khurram Khalil

Reputation: 29

How to calculate moving / running / rolling arbitrary function (e.g. kurtosis & skewness) using NumPy / SciPy

I am working on the time-series data. To get features from data I have to calculate moving mean, median, mode, slop, kurtosis, skewness etc. I am familiar with scipy.stat which provides an easy way to calculate these quantities for straight calculation. But for the moving/running part, I have explored the whole internet and got nothing.

Surprisingly moving mean, median and mode are very easy to calculate with numpy. Unfortunately, there is no built-in function for calculating kurtosis and skewness. If someone can help, how to calculate moving kurtosis and skewness with scipy? Many thanks

Upvotes: 0

Views: 2522

Answers (3)

Flavio
Flavio

Reputation: 839

Based on the answer of @KhurramKhalil I would suggest a solution using list comprehension to avoid the for loop:

import numpy as np
from scipy.stats import kurtosis, skew

x = [1, 2, 4, 8, 9, 81, 25, 35, 1, 6]

def calculate_rolling_skewness(array_series: np.ndarray, window: int) -> np.ndarray:
    # According to the series size, set the calculable windows
    positions_to_be_calculated = len(array_series) - (window - 1)

    # Kurtosis calculation for each window
    skewness_series = [skew(array_series[i:(i + window)]) for i in range(positions_to_be_calculated)]

    # Concatenate the series to have the same initial array size
    skewness_series = np.concatenate((np.full(window-1, np.nan), skewness_series), axis=0)

    return skewness_series

 

calculate_rolling_skewness(array_series=x, window=window)

Results:

array([        nan,         nan,  0.38180177,  0.38180177, -0.59517006,
        0.70665284,  0.56731658,  0.61897406, -0.45563859,  0.6485518 ])

You can do the same using the kurtosis function as well.

Upvotes: 0

norok2
norok2

Reputation: 26916

Pandas offers a DataFrame.rolling() method which can be used, in combination with its Rolling.apply() method (i.e. df.rolling().apply()) to apply an arbitrary function to the specified rolling window.


If you are looking for NumPy-based solution, you could use FlyingCircus Numeric (disclaimer: I am the main author of it).

There, you could find the following:

  1. flyingcircus_numeric.running_apply(): can apply any function to a 1D array and supports weights, but it is slow;
  2. flyingcircus_numeric.moving_apply(): can apply any function supporting a axis: int parameter to a 1D array and supports weights, and it is fast (but memory-hungry);
  3. flyingcircus_numeric.rolling_apply_nd(): can apply any function supporting a axis: int|Sequence[int] parameter to any ND array and it is fast (and memory-efficient), but it does not support weights.

Based on your requirements, I would suggest to use rolling_apply_nd(), e.g.:

import numpy as np
import scipy as sp
import flyingcircus_numeric as fcn

import scipy.stats


NUM = 30
arr = np.arange(NUM)

window = 4
new_arr = fcn.rolling_apply_nd(arr, window, func=sp.stats.kurtosis)
print(new_arr)
# [-1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
#  -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
#  -1.36 -1.36 -1.36]

Of course, feel free to inspect the source code, it is open source (GPL).


EDIT

Just to get a feeling of the kind of speed we are talking about, these are the benchmarks for the solutions implemented in FlyingCircus:

benchmark benchmark_zoom

The general approach flyingcircus_numeric.running_apply() is a couple of orders of magnitude slower than either flyingcircus_numeric.rolling_apply_nd() or flyingcircus_numeric.moving_apply(), with the first being approx. one order of magnitude faster than the second. This shows the speed price for generality or support for weighting.

The above plots were obtained using the scripts from here and the following code:

import scipy as sp
import flyingcircus_numeric as fcn 
import scipy.stats


WINDOW = 4
FUNC = sp.stats.kurtosis


def my_rolling_apply_nd(arr, window=WINDOW, func=FUNC):
    return fcn.rolling_apply_nd(arr, window, func=FUNC)


def my_moving_apply(arr, window=WINDOW, func=FUNC):
    return fcn.moving_apply(arr, window, func)


def my_running_apply(arr, window=WINDOW, func=FUNC):
    return fcn.running_apply(arr, window, func)


def equal_output(a, b):
    return np.all(np.isclose(a, b))


input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
funcs = my_rolling_apply_nd, my_moving_apply, my_running_apply

runtimes, input_sizes, labels, results = benchmark(
    funcs, gen_input=np.random.random, equal_output=equal_output,
    input_sizes=input_sizes)

plot_benchmarks(runtimes, input_sizes, labels, units='s')
plot_benchmarks(runtimes, input_sizes, labels, units='ms', zoom_fastest=8)

(EDITED to reflect some refactoring of FlyingCircus)

Upvotes: 2

Khurram Khalil
Khurram Khalil

Reputation: 29

After playing around, I have come up with a solution that is purely numpy and scipy based. Of course it is using scipy.stats kurtosis and skew.

import numpy as np
from scipy.stats import kurtosis, skew

# Window size
N = 4

# Some random data
m = np.array([2, 3, 10, 11, 0, 4, 8, 2, 5, 9])

# Running Kurtosis
def runningKurt(x, N):
    # Initilize placeholder array
    y = np.zeros((len(x) - (N - 1),))
    for i in range(len(x) - (N - 1)):

         y[i] = kurtosis(x[i:(i + N)])

    return y

# Running Kurtosis

def runningSkew(x, N):
    # Initilize placeholder array
    y = np.zeros((len(x) - (N - 1),))
    for i in range(len(x) - (N - 1)):

         y[i] = skew(x[i:(i + N)])

    return y

kurt = runningKurt(m, N)
print("kurtosis : ", kurt)
# kurtosis :  [-1.93940828 -1.77879935 -1.61464214 -1.40236694 -1.15428571 -1.07626667 -1.42666667]


skw = runningSkew(m, N)
print("skew : ", skw)
# skew :  [ 0.         -0.1354179  -0.26356495 -0.13814702  0.43465076  0.32331615 -0.36514837]

Upvotes: 1

Related Questions