Reputation: 1638
Python newbie here, I have read Filter rows of a numpy array? and the doc but still can't figure out how to code it the python way.
Example array I have: (the real data is 50000 x 10)
a = numpy.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])
filter = ['a','c']
I need to find all rows in a
with a[:, 1] in filter
. Expected result:
[[2,'a'],[4,'c']]
My current code is this:
numpy.asarray([x for x in a if x[1] in filter ])
It works okay but I have read somewhere that it is not efficient. What is the proper numpy method for this?
Thanks for all the correct answers! Unfortunately I can only mark one as accepted answer. I am surprised that numpy.in1d
is not turned up in google searchs for numpy filter 2d array
.
Upvotes: 12
Views: 35435
Reputation: 231375
In this case where the len(filter)
is sufficiently smaller than a[:,1]
, np.in1d
does an iterative version of
mask = (a[:,1,None] == filter[None,:]).any(axis=1)
a[mask,:]
It does (adapting the in1d
code):
In [1301]: arr1=a[:,1];arr2=np.array(filter)
In [1302]: mask=np.zeros(len(arr1),dtype=np.bool)
In [1303]: for i in arr2:
...: mask |= (arr1==i)
In [1304]: mask
Out[1304]: array([ True, False, True, False], dtype=bool)
With more items in filter
is would build its search around unique
, concatenate
and argsort
, looking for duplicates.
So it's convenience hides a fair amount of complexity.
Upvotes: 0
Reputation: 9726
A somewhat elaborate pure numpy
vectorized solution:
>>> import numpy
>>> a = numpy.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])
>>> filter = numpy.array(['a','c'])
>>> a[(a[:,1,None] == filter[None,:]).any(axis=1)]
array([['2', 'a'],
['4', 'c']],
dtype='|S21')
None
in the index creates a singleton dimension, therefore we can compare the column of a
and the row of filter
, and then reduce the resulting boolean array
>>> a[:,1,None] == filter[None,:]
array([[ True, False],
[False, False],
[False, True],
[False, False]], dtype=bool)
over the second dimension with any
.
Upvotes: 3
Reputation: 5177
You can use a bool
index array that you can produce using np.in1d
.
You can index a np.ndarray
along any axis
you want using for example an array of bool
s indicating whether an element should be included. Since you want to index along axis=0
, meaning you want to choose from the outest index, you need to have 1D np.array
whose length is the number of rows. Each of its elements will indicate whether the row should be included.
A fast way to get this is to use np.in1d
on the second column of a
. You get all elements of that column by a[:, 1]
. Now you have a 1D np.array
whose elements should be checked against your filter. Thats what np.in1d
is for.
So the complete code would look like:
import numpy as np
a = np.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])
filter = np.asarray(['a','c'])
a[np.in1d(a[:, 1], filter)]
or in a longer form:
import numpy as np
a = np.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])
filter = np.asarray(['a','c'])
mask = np.in1d(a[:, 1], filter)
a[mask]
Upvotes: 7
Reputation: 16629
Try this:
>>> a[numpy.in1d(a[:,1], filter)]
array([['2', 'a'],
['4', 'c']],
dtype='|S21')
Also go through http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html
Upvotes: 2