user3925736
user3925736

Reputation: 109

split elements in array using python

I have a big array and a part of that is shown below. in each list, the first number is start and the 2nd number is end (so there is a range). what I want to do is:

1: filter out those lists (ranges) which are smaller than 300 (e.g. the 18th list in the following array must be removed)

2: get a smaller ranges (lists) in this way: (start+100) to (start+200). e.g the first list would be [ 569, 669].

I tried to use different split functions in numpy but non of them gives what I am looking for.

array([[ 469, 1300],
       [ 171, 1440],
       [ 187, 1564],
       [ 204, 1740],
       [  40, 1363],
       [  56, 1457],
       [ 132,  606],
       [1175, 2096],
       [ 484, 2839],
       [ 132, 4572],
       [ 166, 1693],
       [  69, 3300],
       [ 142, 1003],
       [2118, 2118],
       [ 715, 1687],
       [ 301, 1006],
       [  48, 2142],
       [  63,  330],
       [ 479, 2411]], dtype=uint32)

do you guys know how to do that in python?

thanks

Upvotes: 2

Views: 1773

Answers (4)

Jon Clements
Jon Clements

Reputation: 142226

Assuming your array is called A, then:

import numpy as np

# Filter out differences not wanted
gt300 = A[(np.diff(A) >= 300).flatten()]

# Set new value of first column
gt300[:,0] += 100

# Set value of second column
gt300[:,1] = gt300[:,0] + 100

Or maybe something like:

B = A[:,0][(np.diff(A) >= 300).flatten()]
C = np.repeat(B, 2).reshape((len(B), 2)) + [100, 200]

Upvotes: 2

hpaulj
hpaulj

Reputation: 231665

We can find which rows have the small difference with:

In [745]: mask=(x[:,1]-x[:,0])<300
In [746]: mask
Out[746]: 
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False,  True, False], dtype=bool)

We can use that mask to select those rows, or to deselect them

In [747]: x[mask,:]
Out[747]: 
array([[2118, 2118],
       [  63,  330]], dtype=uint32)
In [748]: x[~mask,:]
Out[748]: 
array([[ 469, 1300],
       [ 171, 1440],
       [ 187, 1564],
       [ 204, 1740],
       ...
       [ 479, 2411]], dtype=uint32)

To make a new set of ranges; get the first column; here I am using [0] so the selection remains a column array:

In [750]: x[:,[0]]
Out[750]: 
array([[ 469],
       [ 171],
       [ 187],
        ...
       [  48],
       [  63],
       [ 479]], dtype=uint32)

Add to that the desired offsets. This takes advantage of broadcasting.

In [751]: x[:,[0]]+[100,200]
Out[751]: 
array([[ 569,  669],
       [ 271,  371],
       [ 287,  387],
       [ 304,  404],
       [ 140,  240],
       [ 156,  256],
      ...
       [ 401,  501],
       [ 148,  248],
       [ 163,  263],
       [ 579,  679]], dtype=int64)

There are other ways of constructing such an array

np.column_stack([x[:,0]+100,x[:,0]+200])
np.array([x[:,0]+100, x[:,0]+200]).T   # or vstack

Other answers have suggested the Python list filter. I'm partial to list comprehensions in this kind of use, for example:

In [756]: np.array([i for i in x if (i[1]-i[0])<300])
Out[756]: 
array([[2118, 2118],
       [  63,  330]], dtype=uint32)

For small lists of lists, the pure Python approach tends to be faster. But if the object is already a numpy array, it is faster to use the numpy operations that work on the whole array at once (i.e. do the iteration in compiled code). Hence my suggestion to use the boolean mask.

Upvotes: 0

Simon Kirsten
Simon Kirsten

Reputation: 2577

data = [[ 469, 1300],
        # ...
        [  63,  330],
        [ 479, 2411]]

print(
    filter(lambda v: v[1] - v[0] >= 300, data)
)

print(
    [[v[0] + 100, v[0] + 200] for v in data]
)

Explanation:

The first command uses the builtin filter method to filter the remaining elements based on a lambda expression.

The second iterates over the list and generates a new one while doing so.

If the input and output should be numpy arrays try the following. Note: There is no way to filter an numpy array without creating a new one.

data = array([
    ( 469, 1300),
    ( 171, 1440),
    # ...
    (  63,  330),
    ( 479, 2411)], dtype=(uint32, uint32))

print(
    array(filter(lambda v: v[1] - v[0] >= 300, data), dtype=(uint32, uint32))
)

print(
    array([[v[0] + 100, v[0] + 200] for v in data], dtype=(uint32, uint32))
)

Upvotes: 0

Ehud Halamish
Ehud Halamish

Reputation: 174

A general note before: You should use tuples to represnt such ranges, not lists, They are immutable data types with a meaning to the order of items in them.

As for 1, it is pretty easy to filter in python:

filter(lambda single_range: single_range[1] - single_range[0] > 300, ranges)

A clearer way (in my opinion) to do this is with a list comprehension:

[(start, end) for start, end in ranges if end - start > 300]

As for 2, I don't fully understand what you mean, but if you mean creating a new list of ranges, where each range is changes using a single function, you mean a map (or my preferred way, a list comprehension which is equal but more descriptive):

[(start + 100, start + 200) for start, end in ranges]

Upvotes: 0

Related Questions