Reputation: 109
I have a big array and a part of that is shown below. in each list, the first number is start and the 2nd number is end (so there is a range). what I want to do is:
1: filter out those lists (ranges) which are smaller than 300 (e.g. the 18th list in the following array must be removed)
2: get a smaller ranges (lists) in this way: (start+100) to (start+200). e.g the first list would be [ 569, 669].
I tried to use different split functions in numpy but non of them gives what I am looking for.
array([[ 469, 1300],
[ 171, 1440],
[ 187, 1564],
[ 204, 1740],
[ 40, 1363],
[ 56, 1457],
[ 132, 606],
[1175, 2096],
[ 484, 2839],
[ 132, 4572],
[ 166, 1693],
[ 69, 3300],
[ 142, 1003],
[2118, 2118],
[ 715, 1687],
[ 301, 1006],
[ 48, 2142],
[ 63, 330],
[ 479, 2411]], dtype=uint32)
do you guys know how to do that in python?
thanks
Upvotes: 2
Views: 1773
Reputation: 142226
Assuming your array is called A
, then:
import numpy as np
# Filter out differences not wanted
gt300 = A[(np.diff(A) >= 300).flatten()]
# Set new value of first column
gt300[:,0] += 100
# Set value of second column
gt300[:,1] = gt300[:,0] + 100
Or maybe something like:
B = A[:,0][(np.diff(A) >= 300).flatten()]
C = np.repeat(B, 2).reshape((len(B), 2)) + [100, 200]
Upvotes: 2
Reputation: 231665
We can find which rows have the small difference with:
In [745]: mask=(x[:,1]-x[:,0])<300
In [746]: mask
Out[746]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, True, False], dtype=bool)
We can use that mask
to select those rows, or to deselect them
In [747]: x[mask,:]
Out[747]:
array([[2118, 2118],
[ 63, 330]], dtype=uint32)
In [748]: x[~mask,:]
Out[748]:
array([[ 469, 1300],
[ 171, 1440],
[ 187, 1564],
[ 204, 1740],
...
[ 479, 2411]], dtype=uint32)
To make a new set of ranges; get the first column; here I am using [0]
so the selection remains a column array:
In [750]: x[:,[0]]
Out[750]:
array([[ 469],
[ 171],
[ 187],
...
[ 48],
[ 63],
[ 479]], dtype=uint32)
Add to that the desired offsets. This takes advantage of broadcasting.
In [751]: x[:,[0]]+[100,200]
Out[751]:
array([[ 569, 669],
[ 271, 371],
[ 287, 387],
[ 304, 404],
[ 140, 240],
[ 156, 256],
...
[ 401, 501],
[ 148, 248],
[ 163, 263],
[ 579, 679]], dtype=int64)
There are other ways of constructing such an array
np.column_stack([x[:,0]+100,x[:,0]+200])
np.array([x[:,0]+100, x[:,0]+200]).T # or vstack
Other answers have suggested the Python
list filter
. I'm partial to list comprehensions in this kind of use, for example:
In [756]: np.array([i for i in x if (i[1]-i[0])<300])
Out[756]:
array([[2118, 2118],
[ 63, 330]], dtype=uint32)
For small lists of lists, the pure Python approach tends to be faster. But if the object is already a numpy
array, it is faster to use the numpy
operations that work on the whole array at once (i.e. do the iteration in compiled code). Hence my suggestion to use the boolean mask.
Upvotes: 0
Reputation: 2577
data = [[ 469, 1300],
# ...
[ 63, 330],
[ 479, 2411]]
print(
filter(lambda v: v[1] - v[0] >= 300, data)
)
print(
[[v[0] + 100, v[0] + 200] for v in data]
)
Explanation:
The first command uses the builtin filter method to filter the remaining elements based on a lambda expression.
The second iterates over the list and generates a new one while doing so.
If the input and output should be numpy arrays try the following. Note: There is no way to filter an numpy array without creating a new one.
data = array([
( 469, 1300),
( 171, 1440),
# ...
( 63, 330),
( 479, 2411)], dtype=(uint32, uint32))
print(
array(filter(lambda v: v[1] - v[0] >= 300, data), dtype=(uint32, uint32))
)
print(
array([[v[0] + 100, v[0] + 200] for v in data], dtype=(uint32, uint32))
)
Upvotes: 0
Reputation: 174
A general note before: You should use tuples to represnt such ranges, not lists, They are immutable data types with a meaning to the order of items in them.
As for 1, it is pretty easy to filter in python:
filter(lambda single_range: single_range[1] - single_range[0] > 300, ranges)
A clearer way (in my opinion) to do this is with a list comprehension:
[(start, end) for start, end in ranges if end - start > 300]
As for 2, I don't fully understand what you mean, but if you mean creating a new list of ranges, where each range is changes using a single function, you mean a map (or my preferred way, a list comprehension which is equal but more descriptive):
[(start + 100, start + 200) for start, end in ranges]
Upvotes: 0