Reputation: 6909
I have a 1-D NumPy array where I create a rolling window and then compute the np.nanstd
:
import numpy as np
def rolling_window(a, window):
a = np.asarray(a)
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
if __name__ == "__main__":
n = 100_000_000
nan_indices = np.random.choice(np.arange(n), size=1000, replace=False)
T = np.random.rand(n)
T[nan_indices] = np.nan
m = 50
np.nanstd(rolling_window(T, m), axis=T.ndim)
However, I noticed that not only is this extremely time consuming, it also uses a lot of memory. Is there a way to improve both the memory and speed performance (Numba is an option)?
Upvotes: 1
Views: 470
Reputation: 221754
After grueling through the math, here's what I ended up with few np.convolve
and some masking to get a vectorized NumPy solution -
def nanstd(a, W):
k = np.ones(W, dtype=int)
m = ~np.isnan(a)
a0 = np.where(m, a,0)
n = np.convolve(m,k,'valid')
c1 = np.convolve(a0, k,'valid')
f2 = c1**2
p2 = f2/n**2
f1 = np.convolve((a0**2)*m,k,'valid')+n*p2
out = np.sqrt((f1 - (2/n)*f2)/n)
return out
Here's the equivalent pandas version, which isn't too bad on performance -
import pandas as pd
def pdroll(T,m):
return pd.Series(T).rolling(m).std(ddof=0).values[m-1:]
Using benchit
package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
def setup(n):
nan_indices = np.random.choice(np.arange(n), size=10, replace=False)
T = np.random.rand(n)
T[nan_indices] = np.nan
return T
import benchit
f = {'rolling': lambda T,m: np.nanstd(rolling_window(T, m), axis=T.ndim),
'pdroll': pdroll, 'conv':nanstd}
in_={(n,w):(setup(n),w) for n in 10**np.arange(2,6) for w in [5,10,20,50,80,100]}
t = benchit.timings(f, in_, multivar=True)
t.plot(logx=True, sp_ncols=2, save='timings.png', dpi=200)
NumPy one is good on smaller window sizes, while pandas is better on larger ones.
NumPy vectorized : Explanation on NumPy version nanstd
Basically, np.nanstd
is computing std
ignoring NaNs
. Now, std
can be computed based on mean
.
Thus, for an array a
with no NaNs
, it would be :
np.sqrt(np.mean((a-np.mean(a))**2)) # (1)
Let's prove it :
In [43]: a = np.arange(1,6).astype(float)
In [44]: np.nanstd(a)
Out[44]: 1.4142135623730951
In [45]: np.sqrt(np.mean((a-np.mean(a))**2))
Out[45]: 1.4142135623730951
Now, let's say, we have a NaN
in it :
In [46]: a[2] = np.nan
In [47]: a
Out[47]: array([ 1., 2., nan, 4., 5.])
The std
with nanstd
would be :
In [48]: np.nanstd(a)
Out[48]: 1.5811388300841898
Let's figure out the equivalent one based on (1)
.
Let's start with (a-np.mean(a))**2
.
This one : ?
In [72]: (a-np.mean(a))**2
Out[72]: array([nan, nan, nan, nan, nan])
No!
This one : ?
In [73]: (a0 - np.sum(a0)/n)**2
Out[73]: array([4., 1., 9., 1., 4.])
No! Because a
is :
In [76]: a
Out[76]: array([ 1., 2., nan, 4., 5.])
We need to make the NaN
position one 0
.
This one : ?
In [75]: m*((a0 - np.sum(a0)/n)**2)
Out[75]: array([4., 1., 0., 1., 4.])
Yes!
Then, what's np.mean((a-np.mean(a))**2)
? It would be, sum of those in [75]
divided by n
:
In [77]: np.sum(m*((a0-np.sum(a0)/n)**2))/n
Out[77]: 2.5
Hence, the final std value :
In [78]: np.sqrt(np.sum(m*((a0-np.sum(a0)/n)**2))/n)
Out[78]: 1.5811388300841898
Summarizing :
In [55]: m = ~np.isnan(a) # (2)
...: a0 = np.where(m, a,0)
...: n = m.sum()
...: out0 = np.sqrt(np.sum(m*((a0-np.sum(a0)/n)**2))/n)
In [56]: out0
Out[56]: 1.5811388300841898
Next part is incorporating the sliding nature. So, we need to do (2)
in a sliding nature. So, the first two steps remains.
Hence, it starts off with :
m = ~np.isnan(a)
a0 = np.where(m, a,0)
But the last two would change, let's see how.
Let's focus on the final step to compute out0
. We have :
m*((a0-np.sum(a0)/n)**2)
Then, we compute the summation :
np.sum(m*((a0-np.sum(a0)/n)**2))
We have : (a-b)**2 = a**2 + b**2 - 2*a*b
. So, earlier step becomes
np.sum(m*(a0**2 + (np.sum(a0)/n)**2 - 2*a0*np.sum(a0)/n))
Further re-arranging leads to :
np.sum(m*(a0**2 + (np.sum(a0)/n)**2) - np.sum(2*a0*np.sum(a0)/n))
np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - np.sum(2*a0*np.sum(a0)/n)
np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - 2*np.sum(a0*np.sum(a0))/n
np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - (2/n)*np.sum(a0*np.sum(a0)) # (3)
Let's focus on the first two parts for the summation.
Also, let's take a sample case to make things concrete. We will setup two datasets - One for the complete array setup and another a windowed version of the complete one.
Setup :
#=========================== 1. Complete setup
a = np.arange(1,10).astype(float)
a[[2,5]] = np.nan
W = 5
k = np.ones(W, dtype=int)
m_comp = ~np.isnan(a)
a0_comp = np.where(m_comp, a,0)
n_comp = np.convolve(m_comp,k,'valid')
c1 = np.convolve(a0_comp, k,'valid')
c2 = np.convolve((a0_comp**2)*m_comp,k,'valid')
#=========================== 2. Windowed setup
a1 = np.arange(1,6).astype(float)
a1[2] = np.nan
m = ~np.isnan(a1)
a0 = np.where(m, a1,0)
n = m.sum()
out0 = np.sqrt(np.sum(m*((a0-np.sum(a0)/n)**2))/n)
From windowed setup, we have :
In [51]: np.sum(m*(a0**2 + (np.sum(a0)/n)**2))
Out[51]: 82.0
In [52]: np.sum(m*(a0**2) + m*((np.sum(a0)/n)**2))
Out[52]: 82.0
In [53]: np.sum(m*(a0**2)) + np.sum(m*((np.sum(a0)/n)**2))
Out[53]: 82.0
First summation part :
In [86]: np.sum(m*(a0**2))
Out[86]: 46.0
# complete setup version :
In [87]: c2
Out[87]: array([ 46., 45., 90., 154., 219.])
Second summation part :
In [54]: np.sum(m*((np.sum(a0)/n)**2))
Out[54]: 36.0
# complete setup version :
In [55]: n_comp*(c1/n_comp)**2
Out[55]:
array([ 36. , 40.33333333, 85.33333333, 144. ,
210.25 ])
The remaining piece of puzzle fromn (3)
is :
In [79]: (2/n)*np.sum(a0*np.sum(a0))
Out[79]: 72.0
Let's focus on the meat of it :
In [80]: np.sum(a0*np.sum(a0))
Out[80]: 144.0
On the complete setup, it would correspond to :
In [81]: c1**2
Out[81]: array([144., 121., 256., 576., 841.])
Thus, for the entire remaining piece :
In [82]: (2/n)*np.sum(a0*np.sum(a0))
Out[82]: 72.0
# complete setup version :
In [83]: (2/n_comp)*c1**2
Out[83]:
array([ 72. , 80.66666667, 170.66666667, 288. ,
420.5 ])
Hence, (3)
and its complete version counterpart would be :
In [89]: np.sum(m*(a0**2 + (np.sum(a0)/n)**2)) - (2/n)*np.sum(a0*np.sum(a0))
Out[89]: 10.0
In [90]: c2 + n_comp*(c1/n_comp)**2 - (2/n_comp)*c1**2
Out[90]: array([10. , 4.66666667, 4.66666667, 10. , 8.75 ])
To get the final std values, we need to divide by the count of valid ones per window and then apply sqrt
:
In [99]: np.sqrt((c2 + n_comp*(c1/n_comp)**2 - (2/n_comp)*c1**2)/n_comp)
Out[99]: array([1.58113883, 1.24721913, 1.24721913, 1.58113883, 1.47901995])
Hence, with some cleanup, we end up with the final nanstd
version.
Upvotes: 1