Reputation: 127
Is there a xArray
way of computing quantiles on a DataArray.rolling
window? The listed available methods include mean
or median
, but nothing on quantiles/percentiles. I was wondering if this could be somehow done even though there is no direct way.
Currently, I am locally migrating the xArray
data to a pandas.DataFrame
, where I apply the rolling().quantile()
sequence. After that, I take the values of the new DataFrame
and build a xArray.DataArray
from it. The reproducible code:
import xarray as xr
import pandas as pd
import numpy as np
times = np.arange(0, 30)
locs = ['A', 'B', 'C', 'D']
signal = xr.DataArray(np.random.rand(len(times), len(locs)),
coords=[times, locs], dims=['time', 'locations'])
window = 5
df = pd.DataFrame(data=signal.data)
roll = df.rolling(window=window, center=True, axis=0).quantile(.25).dropna()
window_array = xr.DataArray(roll.values,
coords=[np.arange(0, signal.time.shape[0] - window + 1), signal.locations],
dims=['time', 'locations'])
Any clue to stick to xArray
as much as possible is welcome.
Let us consider the same problem, only smaller in size (10 time instances, 2 locations).
Here is the input of the first method (via pandas
):
<xarray.DataArray (time: 8, locations: 2)>
array([[0.404362, 0.076203],
[0.353639, 0.076203],
[0.387167, 0.102917],
[0.525404, 0.298231],
[0.755646, 0.298231],
[0.460749, 0.414935],
[0.104887, 0.498813],
[0.104887, 0.420935]])
Coordinates:
* time (time) int32 0 1 2 3 4 5 6 7
* locations (locations) <U1 'A' 'B'
Note that the 'time' dimension is smaller, due to calling dropna()
on the rolling object. The new dimension size is basically len(times) - window + 1
. Now, the output for the proposed method (via construct
):
<xarray.DataArray (time: 10, locations: 2)>
array([[0.438426, 0.127881],
[0.404362, 0.076203],
[0.353639, 0.076203],
[0.387167, 0.102917],
[0.525404, 0.298231],
[0.755646, 0.298231],
[0.460749, 0.414935],
[0.104887, 0.498813],
[0.104887, 0.420935],
[0.112651, 0.60338 ]])
Coordinates:
* time (time) int32 0 1 2 3 4 5 6 7 8 9
* locations (locations) <U1 'A' 'B'
It seems like the dimensions are still (time, locations)
, with the size of the former equal to 10, not 8. In the example here, since center=True
, the two results are the same if you remove the first and the last rows in the second array. Shouldn't the DataArray
have a new dimension, the tmp
?
Also, this method (with bottleneck
installed) takes more than the one initially proposed via pandas
. For example, on a case study of 1000 times
x 2 locations
, the pandas
run takes 0.015 s, while the construct
one takes 1.25 s.
Upvotes: 2
Views: 1754
Reputation: 1406
You can use construct
method of the rolling object, which generates a new DataArray
with the rolling dimension.
signal.rolling(time=window, center=True).construct('tmp').quantile(.25, dim='tmp')
Above, I constructed a DataArray with additional tmp
dimension and compute quantile along this dimension.
Upvotes: 6