Reputation: 1716
I have a 2d array. I need to filter the array for rows with values at a particular index. The values are from a list.
Here's an example.
My data:
arr= [[ 1.681, 1.365, 0.105, 0.109, 0.50],
[ 1.681, 1.365, 0.105, 0.109, 0.51],
[ 1.681, 1.365, 0.105, 0.109, 0.52],
[ 1.681, 1.365, 0.105, 0.109, 0.53],
[ 1.681, 1.365, 0.105, 0.109, 0.54],
[ 1.681, 1.365, 0.105, 0.109, 0.55],
[ 1.681, 1.365, 0.105, 0.109, 0.56],
[ 1.681, 1.365, 0.105, 0.109, 0.57],
[ 1.681, 1.365, 0.105, 0.109, 0.58],
[ 1.681, 1.365, 0.105, 0.109, 0.59],
[ 1.681, 1.365, 0.105, 0.109, 0.60]]
Let's say I want to filter for rows where the last entry is from the list 0.5,0.55,0.6.
I tried making a mask as follows:
>>> mask= arr['f4'] in [0.5, 0.55, 0.6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>> mask= arr['f4']==0.5 or arr['f4']==0.55 or arr['f4']==0.6
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str
>>>
As shown it doesn't work.
Desired output is:
>>> arr_mask
[[1.681, 1.365, 0.105, 0.109, 0.5], [1.681, 1.365, 0.105, 0.109, 0.55], [1.681, 1.365, 0.105, 0.109, 0.6]]
Your feedback is appreciated.
EDIT1: There was a question about 'f4'. That seems to come from the way I read the data from a file into the array.
>>> arr= np.genfromtxt('data.rpt',dtype=None)
>>> arr
array([ ('tag', 1.681, 1.365, 0.105, 0.109, 0.5),
('tag', 1.681, 1.365, 0.105, 0.109, 0.51),
('tag', 1.681, 1.365, 0.105, 0.109, 0.52),
('tag', 1.681, 1.365, 0.105, 0.109, 0.53),
('tag', 1.681, 1.365, 0.105, 0.109, 0.54),
('tag', 1.681, 1.365, 0.105, 0.109, 0.55),
('tag', 1.681, 1.365, 0.105, 0.109, 0.56),
('tag', 1.681, 1.365, 0.105, 0.109, 0.57),
('tag', 1.681, 1.365, 0.105, 0.109, 0.58),
('tag', 1.681, 1.365, 0.105, 0.109, 0.59),
('tag', 1.681, 1.365, 0.105, 0.109, 0.6)],
dtype=[('f0', 'S837'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])
EDIT02:
Tried the proposal from jp_data_analysis but it does not work. Might be caused by the origin of the array from reading from file?
>>> arr_np = np.array(arr)
>>> search = np.array([0.50, 0.55, 0.60])
>>> arr_np[np.in1d(arr_np[:,-1], search)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
>>>
Upvotes: 2
Views: 362
Reputation: 164613
For a vectorised approach try numpy
:
import numpy as np
arr= [[ 1.681, 1.365, 0.105, 0.109, 0.50],
[ 1.681, 1.365, 0.105, 0.109, 0.51],
[ 1.681, 1.365, 0.105, 0.109, 0.52],
[ 1.681, 1.365, 0.105, 0.109, 0.53],
[ 1.681, 1.365, 0.105, 0.109, 0.54],
[ 1.681, 1.365, 0.105, 0.109, 0.55],
[ 1.681, 1.365, 0.105, 0.109, 0.56],
[ 1.681, 1.365, 0.105, 0.109, 0.57],
[ 1.681, 1.365, 0.105, 0.109, 0.58],
[ 1.681, 1.365, 0.105, 0.109, 0.59],
[ 1.681, 1.365, 0.105, 0.109, 0.60]]
arr = np.array(arr)
search = np.array([0.50, 0.55, 0.60])
arr[np.in1d(arr[:,-1], search)]
# array([[ 1.681, 1.365, 0.105, 0.109, 0.5 ],
# [ 1.681, 1.365, 0.105, 0.109, 0.55 ],
# [ 1.681, 1.365, 0.105, 0.109, 0.6 ]])
I expect this to be more efficient for larger arrays.
Upvotes: 1
Reputation: 352
arr= np.array([[ 1.681, 1.365, 0.105, 0.109, 0.50],
[ 1.681, 1.365, 0.105, 0.109, 0.51],
[ 1.681, 1.365, 0.105, 0.109, 0.52],
[ 1.681, 1.365, 0.105, 0.109, 0.53],
[ 1.681, 1.365, 0.105, 0.109, 0.54],
[ 1.681, 1.365, 0.105, 0.109, 0.55],
[ 1.681, 1.365, 0.105, 0.109, 0.56],
[ 1.681, 1.365, 0.105, 0.109, 0.57],
[ 1.681, 1.365, 0.105, 0.109, 0.58],
[ 1.681, 1.365, 0.105, 0.109, 0.59],
[ 1.681, 1.365, 0.105, 0.109, 0.60]])
mask=[.5,.6,.55]
arr_mask = np.array([x for x in arr if sum(np.isin(a,mask))])
Upvotes: 0
Reputation: 914
The answers you've got are using numpy, but in case you are not able to use numpy, this could work too.
You can use list comprehension (like @interent_user said)
masked_data = [ x for x in arr if x[-1] in [0.5, 0.55, 0.6] ]
you can also use filter
masked_data = list(filter(lambda x: x[-1] in [0.5, 0.55, 0.6], arr)
Upvotes: 0
Reputation: 3706
basically from the np.where
docs
import numpy as np
arr= np.array([[ 1.681, 1.365, 0.105, 0.109, 0.50],
[ 1.681, 1.365, 0.105, 0.109, 0.51],
[ 1.681, 1.365, 0.105, 0.109, 0.52],
[ 1.681, 1.365, 0.105, 0.109, 0.53],
[ 1.681, 1.365, 0.105, 0.109, 0.54],
[ 1.681, 1.365, 0.105, 0.109, 0.55],
[ 1.681, 1.365, 0.105, 0.109, 0.56],
[ 1.681, 1.365, 0.105, 0.109, 0.57],
[ 1.681, 1.365, 0.105, 0.109, 0.58],
[ 1.681, 1.365, 0.105, 0.109, 0.59],
[ 1.681, 1.365, 0.105, 0.109, 0.60]])
ix = np.isin(arr[:,-1], [0.5,0.55,0.6])
np.where(ix)
Out[107]: (array([ 0, 5, 10], dtype=int64),)
arr[np.where(ix),:]
Out[108]:
array([[[ 1.681, 1.365, 0.105, 0.109, 0.5 ],
[ 1.681, 1.365, 0.105, 0.109, 0.55 ],
[ 1.681, 1.365, 0.105, 0.109, 0.6 ]]])
Upvotes: 1