Han Zhengzu
Han Zhengzu

Reputation: 3852

How to get the position in a certain percentile range of 2-d array with np.NAN in it

Here is my question.
With some background intro here:

My target

Get the array indice of each elements in the percentile range of (70, 80) for a

My attempt

  1. fliter the np.NAN in a

    a_noNaN = np.array([0,])
    
    for i in range(0,a.shape[0],1):
     for j in range(0,a.shape[1],1):
         if (np.isnan(a[i,j]) == False):
            a_noNaN       = np.append(a_noNaN,a[i,j])
    
    a_noNaN = a_noNaN[1:] ## the first element "0" is redundant
    
  2. Sort the data in order and determine the value range

    a_noNaN_sort = np.sort(a_noNaN)
    a_70 = np.percentile(a_noNaN, 70)
    a_80 = np.percentile(a_noNaN, 80)
    
  3. Get the array indices in such value range

    k = 0
    indice    = np.array([(i, j) for i in xrange(a.shape[0]) for j in xrange(a.shape[1])])
    indice_in = np.zeros_like(indice)
    for t in range(0,indice.shape[0],1):
        for i in range(a.shape[0]):
            for j in range(a.shape[1]):
                if ((a[i,j]<a_80)& ((a[i,j]>a_70))):
                    indice_in[t] = indice[t]
    

I don't know my method whether right or wrong.
Is there any easier function I can use to get thing done?
Any advice would be appreciate!ヽ(✿゚▽゚)ノ

Upvotes: 1

Views: 435

Answers (1)

Makis Tsantekidis
Makis Tsantekidis

Reputation: 2728

You can use the numpy function nanpercentile to avoid taking nan values into consideration

a_70 = np.nanpercentile(a, 70)
a_80 = np.nanpercentile(a, 80)

to find all the elements that are between the two percentiles you can use numpy's boolean indexing

bool_indexes = np.logical_and(a > a_70, a < a_80)
indexes = np.nonzero(bool_indexes)

indexes is going to be a 2d array with the indexes for all the elements that are between the values a_70 and a_80

Simple demonstration of the execution

# create a random (20x20) matrix
a = np.random.randn(20,20)
a_70 = np.nanpercentile(a, 70)
a_80 = np.nanpercentile(a, 80)
bool_indexes = np.logical_and(a > a_70, a < a_80)
indexes = np.nonzero(bool_indexes)
print indexes

(array([ 0,  0,  0,  1,  2,  3,  3,  3,  5,  6,  6,  7,  7,  7,  7,  8,  8,
        8,  8,  9, 10, 11, 11, 11, 11, 12, 12, 13, 14, 14, 14, 16, 16, 16,
       16, 17, 19, 19, 19, 19]), 
 array([ 3, 10, 16, 11,  8,  7, 10, 11,  8,  0, 11, 10, 11, 17, 19, 10, 11,
       15, 19, 17,  0,  0,  1,  8, 10,  2,  8, 19, 10, 12, 17,  0,  1,  6,
       14,  6,  3,  4,  5,  7]))

Upvotes: 2

Related Questions