Okapi575
Okapi575

Reputation: 718

Slices of start, stop indices of valid (non-NaNs) portions of a NumPy array

I have a large numpy 1d array which contains nans. I need to know all the slices that do not contain any nans:

 import numpy as np
 A=np.array([1.0,2.0,3.0,np.nan,4.0,3.0,np.nan,np.nan,np.nan,2.0,2.0,2.0])

The expected result for the example would be:

 Slices=[slice(0,3),slice(4,6),slice(9,12)]

Upvotes: 1

Views: 324

Answers (2)

Divakar
Divakar

Reputation: 221684

One approach to get such a list of slices with the idea of performing minimum work in a list comprehension -

def start_stop_nonNaN_slices(A):
    mask = ~np.isnan(A)
    mask_ext = np.r_[False, mask, False]
    idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1]).reshape(-1,2)
    return [slice(i[0],i[1]) for i in idx]

Sample runs -

In [32]: A
Out[32]: 
array([  1.,   2.,   3.,  nan,   4.,   3.,  nan,  nan,  nan,   2.,   2.,
         2.])

In [33]: start_stop_nonNaN_slices(A)
Out[33]: [slice(0, 3, None), slice(4, 6, None), slice(9, 12, None)]

In [35]: A
Out[35]: 
array([ nan,   1.,   2.,   3.,  nan,   4.,   3.,  nan,  nan,  nan,   2.,
         2.,   2.])

In [36]: start_stop_nonNaN_slices(A)
Out[36]: [slice(1, 4, None), slice(5, 7, None), slice(10, 13, None)]

Output in different formats

I. If you need those start, stop indices as pairs of tuples -

def start_stop_nonNaN_slices_v2(A):
    mask = ~np.isnan(A)
    mask_ext = np.r_[False, mask, False]
    idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
    return zip(idx[::2], idx[1::2])

Sample run -

In [51]: A
Out[51]: 
array([ nan,   1.,   2.,   3.,  nan,   4.,   3.,  nan,  nan,  nan,   2.,
         2.,   2.,  nan,  nan])

In [52]: start_stop_nonNaN_slices_v2(A)
Out[52]: [(1, 4), (5, 7), (10, 13)]

II. If you are okay with start and stop indices as two output arrays and this should be pretty efficient as we are avoiding any list-comprehension or zipping -

def start_stop_nonNaN_slices_v3(A):
    mask = ~np.isnan(A)
    mask_ext = np.r_[False, mask, False]
    idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
    return idx[::2], idx[1::2]

Sample run -

In [74]: A
Out[74]: 
array([ nan,   1.,   2.,   3.,  nan,   4.,   3.,  nan,  nan,  nan,   2.,
         2.,   2.,  nan,  nan])

In [75]: starts, stops = start_stop_nonNaN_slices_v3(A)

In [76]: starts
Out[76]: array([ 1,  5, 10])

In [77]: stops
Out[77]: array([ 4,  7, 13])

Note on performance : For performance, we could use np.concatenate to replace np.r_ :

mask_ext = np.concatenate(( [False], mask, [False] ))

Upvotes: 1

javidcf
javidcf

Reputation: 59731

Here is a possibility:

import numpy as np

def valid_slices(array):
    m = ~np.isnan(array)
    idx = np.arange(len(array))[m]
    idx_diff = np.diff(idx)
    idx_change = np.where(idx_diff > 1)[0]
    idx_start = np.concatenate([[0], idx_change + 1], axis=0)
    idx_end = np.concatenate([idx_change, [len(idx) - 1]], axis=0)
    return [slice(idx[start], idx[end] + 1) for start, end in zip(idx_start, idx_end)]

A = np.array([1.0,2.0,3.0,np.nan,4.0,3.0,np.nan,np.nan,np.nan,2.0,2.0,2.0])
print(valid_slices(A))

>>> [slice(0, 3, None), slice(4, 6, None), slice(9, 12, None)]

Upvotes: 1

Related Questions