r_31415
r_31415

Reputation: 8982

Window overlap in Pandas

In pandas, there are several methods to manipulate data in a given window (e.g. pd.rolling_mean or pd.rolling_std.) However, I would like to set a window overlap, which I think, is a pretty standard requirement. For example, in the following image, you can see a window spanning 256 samples and overlapping 128 samples.

http://health.tau.ac.il/Communication%20Disorders/noam/speech/mistorin/images/hamming_overlap1.JPG

How can I do that using the optimized methods included in Pandas or Numpy?

Upvotes: 8

Views: 5115

Answers (3)

MaMaG
MaMaG

Reputation: 379

As of numpy 1.20 (released a few months ago), there is a new, much more stable implementation of this:

https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html#numpy.lib.stride_tricks.sliding_window_view

To do a moving window with window size 3 and stride of 2, just do this (from the documentation):

x = np.arange(7)
sliding_window_view(x, 3)[::2, :]

I was looking at the responses here, and trying to use as_strided. It seemed to work fine with a float array I had. But then I tried to use it on a boolean array, and I got garbage out. Even after converting to ints or floats, same thing (different garbage). But using sliding_window_view works. Yes, you first have to generate the whole array and then subset it, which is a memory hog, but it works for what I need.

Upvotes: 1

Jaime
Jaime

Reputation: 67467

Using as_strided you would do something like this:

import numpy as np
from numpy.lib.stride_tricks import as_strided

def windowed_view(arr, window, overlap):
    arr = np.asarray(arr)
    window_step = window - overlap
    new_shape = arr.shape[:-1] + ((arr.shape[-1] - overlap) // window_step,
                                  window)
    new_strides = (arr.strides[:-1] + (window_step * arr.strides[-1],) +
                   arr.strides[-1:])
    return as_strided(arr, shape=new_shape, strides=new_strides)

If you pass a 1D array to the above function, it will return a 2D view into that array, with shape (number_of_windows, window_size), so you could calculate, e.g. the windowed mean as:

win_avg = np.mean(windowed_view(arr, win_size, win_overlap), axis=-1)

For example:

>>> a = np.arange(16)
>>> windowed_view(a, 4, 2)
array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13],
       [12, 13, 14, 15]])
>>> windowed_view(a, 4, 1)
array([[ 0,  1,  2,  3],
       [ 3,  4,  5,  6],
       [ 6,  7,  8,  9],
       [ 9, 10, 11, 12],
       [12, 13, 14, 15]])

Upvotes: 9

Bas Swinckels
Bas Swinckels

Reputation: 18488

I am not familiar with pandas, but in numpy you would do it something like this (untested):

def overlapped_windows(x, nwin, noverlap = None):
    if noverlap is None:
        noverlap = nwin // 2
    step = nwin - noverlap
    for i in range(0, len(x) - nwin + 1, step):
        window = x[i:i+nwin] #this is a view, not a copy
        y = window * hann(nwin)
        #your code here with y

This is ripped from some old code to calculate an averaged PSD, which you typically process with half-overlapping windows. Note that window is a 'view' into array x, which means it does not do any copying of data (very fast, so probably good) and that if you modify window you also modify x (so dont do window = hann * window).

Upvotes: 2

Related Questions