asdflkjwqerqwr
asdflkjwqerqwr

Reputation: 1189

multidimensional boolean array indexing in numpy

I have a two 2D arrays, one of numbers and one of boolean values:

x = 
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.]])

idx = 
array([[False, False, False, False, False, False, False, False, False, False],
       [False,  True,  True,  True,  True,  True, False, False, False, False],
       [False,  True,  True,  True,  True,  True, False, False, False, False],
       [False,  True,  True,  True,  True,  True, False, False, False, False],
       [False, False, False,  True,  True,  True,  True, False, False, False],
       [False, False, False, False,  True,  True,  True, False, False, False],
       [False, False, False, False, False, False,  True, False, False, False],
       [False, False, False, False, False, False, False,  True, False, False],
       [False, False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False, False, False]], dtype=bool)

When I index the array it returns a 1D array:

x[idx]
array([ 1.,  1.,  1.,  1.,  1.,  2.,  2.,  2.,  2.,  2.,  3.,  3.,  3.,
    3.,  3.,  4.,  4.,  4.,  4.,  5.,  5.,  5.,  6.,  7.])

How do I index the array and return a 2D array with the expected output:

x[idx]
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.],
       [ 6.],
       [ 7.]])

Upvotes: 6

Views: 3538

Answers (2)

asdflkjwqerqwr
asdflkjwqerqwr

Reputation: 1189

EDIT:This creates an array of lists

np.array([val[idx[i]].tolist() for i,val in enumerate(x) if len(val[idx[i]].tolist()) > 0])

array([[1.0, 1.0, 1.0, 1.0, 1.0], 
   [2.0, 2.0, 2.0, 2.0, 2.0],
   [3.0, 3.0, 3.0, 3.0, 3.0], 
   [4.0, 4.0, 4.0, 4.0], 
   [5.0, 5.0, 5.0],
   [6.0], 
   [7.0]], dtype=object)

Upvotes: 0

tktk
tktk

Reputation: 11734

Your command returns a 1D array since it's impossible to fulfill without (a) destroying the column structure, which is usually needed. e.g., the 7 in your requested output originally belonged to column 7, and now it's on column 0; and (b) numpy does not, afaik, support high dimensional array with different sizes on the same dimension. What I mean is that numpy can't have an array whose first three rows are of length 5, 4th row of length 4, etc. - all the rows (same dimension) need to have the same length.

I think the best result you could hope for is an array of arrays (and not a 2D array). This is how I would construct it, though there are probably better ways I don't know of:

In [9]: from itertools import izip
In [11]: array([r[ridx] for r, ridx in izip(x, idx) if ridx.sum() > 0])
Out[11]: 
array([array([ 1.,  1.,  1.,  1.,  1.]), array([ 2.,  2.,  2.,  2.,  2.]),
       array([ 3.,  3.,  3.,  3.,  3.]), array([ 4.,  4.,  4.,  4.]),
       array([ 5.,  5.,  5.]), array([ 6.]), array([ 7.])], dtype=object)

Upvotes: 4

Related Questions