ozal
ozal

Reputation: 33

Searching 2D array with numpy.where with multiple conditions

I have a 2D array of arrays defined as follows:

traces = [['x1',11026,0,0,0,0],
          ['x0',11087,0,0,0,1],
          ['x0',11088,0,0,1,3],
          ['x0',11088,0,0,0,3],
          ['x0',11088,0,1,0,1]]

I want to find the index of the row which matches multiple conditions of selected columns. For example I want to find the row in this array where

row[0]=='x0' & row[1]==11088 & row[3]==1 & row[5]=1

Searching on this criteria should return 4.

I attempted to use numpy.where but can't seem to make it work with multiple conditions

print np.where((traces[:,0] == 'x0') & (traces[:,1] == 11088) & (traces[:,3] == 1) & (traces[:,5] == 1))

The above creates the warning

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison   print np.where((traces[:,0] == 'x0') & (traces[:,1] == 11088) & (traces[:,3]
== 1) & (traces[:,5] == 1)) (array([], dtype=int32),)

I've attempted to use numpy.logical_and as well and that doesn't seem to work either, creating similar warnings.

Any way I can do this using numpy.where without iterating over the whole 2D array?

Thanks

Upvotes: 3

Views: 5670

Answers (2)

MB-F
MB-F

Reputation: 23637

I strongly assume you did something like this (conversion to np.array):

traces = [['x1',11026,0,0,0,0],
          ['x0',11087,0,0,0,1],
          ['x0',11088,0,0,1,3],
          ['x0',11088,0,0,0,3],
          ['x0',11088,0,1,0,1]]
          
traces = np.array(traces)

This exhibits the described error. The reason can be seen by printing the resulting array:

print(traces)
# array([['x1', '11026', '0', '0', '0', '0'],
#        ['x0', '11087', '0', '0', '0', '1'],
#        ['x0', '11088', '0', '0', '1', '3'],
#        ['x0', '11088', '0', '0', '0', '3'],
#        ['x0', '11088', '0', '1', '0', '1']],
#       dtype='<U5')

Numbers were converted to strings!

When constructing an array that contains values of different types, numpy usually creates an array of dtype=object. This works in most cases but has bad performance.

However, in this case numpy apparently tried to be smart and converted the data to a string type, which is more specific than object but general enough to take numbers - as strings.

As a solution construct the array explicitly as an "object array":

traces = np.array(traces, dtype='object')

print(np.where((traces[:,0] == 'x0') & (traces[:,1] == 11088) & (traces[:,3] == 1) & (traces[:,5] == 1)))
# (array([4], dtype=int32),)

Note that although this works, object arrays are often not a good idea to use. Consider instead to replace the strings in the first column with numeric values.

Upvotes: 3

fferri
fferri

Reputation: 18940

Consider this comparison:

>>> traces[:,[0,1,3,5]] == ['x0', 11088, 1, 1]
array([[False, False, False, False],
       [ True, False, False,  True],
       [ True,  True, False, False],
       [ True,  True, False, False],
       [ True,  True,  True,  True]])

we are looking for one (or more) row(s) with all values equal to True:

>>> np.where(np.all(traces[:,[0,1,3,5]] == ['x0', 11088, 1, 1], axis=1))
(array([4]),)

Upvotes: 2

Related Questions