Reputation: 35321

Efficient grouping in numpy

I have a list of about 10⁶ pairs, where each element of the pair is either -1, 0, or 1:

[
 [ 0,  1],
 [-1, -1],
 [ 0, -1],
 [ 1,  0],
 ...
]

I want to split these pairs into two groups (i.e. lists of pairs) according to whether the first element of the pair is -1 or not¹.

Is there a way to do this efficiently with numpy?

Despite the terminology and notation I used above, I am in fact agnostic about the actual types of the pairs and the "lists" of pairs. Use whatever numpy or python data structure leads to the most efficient solution. (But no pandas, please.)

EDIT:

For example, if the initial list of pairs is

[
 [ 0, -1],
 [ 0, -1],
 [ 1, -1],
 [-1, -1],
 [ 1,  0],
 [-1,  1],
 [-1, -1],
 [ 0,  0],
 [ 0,  1],
 [-1,  0]
]

...an acceptable result would consist of the two lists

[
 [-1, -1],
 [-1,  1],
 [-1, -1],
 [-1,  0]
]

...and

[
 [ 0, -1],
 [ 0, -1],
 [ 1, -1],
 [ 1,  0],
 [ 0,  0],
 [ 0,  1]
]

The last two lists preserve the ordering of elements as they appeared in the original lists. This would be my preference, but it is not essential. For example, a solution consisting of

[
 [-1, -1],
 [-1, -1],
 [-1,  0],
 [-1,  1]
]

...and

[
 [ 0, -1],
 [ 0, -1],
 [ 0,  0],
 [ 0,  1],
 [ 1, -1],
 [ 1,  0],
]

...would also be acceptable.

^{¹ In other words, all the pairs in one group should have -1 at their first position, and all the elements of the other group should have either 0 or 1 at their first position.}

Upvotes: 1

Answers (3)

Narcisse Doudieu Siewe

Reputation: 1094

you ca, do it yourself! the only efficiency I see is generator or something like that which will save memory at the cost of computation time

def sanitize(yllp):#yllp: list-like of pair
    y = yield
    yield
    for x in yllp:
        if (x[0] in {0,1} and y != -1) or x[0] == -1 == y:
           yield x

Example:

L = [
     (-1,1), 
     (0,1), 
     (0,1), 
     (-1,1), 
     (-1,0), 
     (-1,-1), 
     (0,0), 
     (1,0)
    ]

#get list starting by 0 or 1
w=sanitize(L)    
w.next()
w.send(0)
for i in w:print(i)

#get list starting by -1
t=sanitize(L)
t.next()
t.send(-1)
for i in t:print(i)

Upvotes: -1

Sheldore

Reputation: 39072

How about just using the condition twice to check for positive and negative as

import numpy as np

a = np.array([ [ 0, -1], [ 0, -1], [ 1, -1], [-1, -1], [ 1,  0], 
                    [-1,  1], [-1, -1], [ 0,  0], [ 0,  1], [-1,  0]])

pos = a[a[:, 0]!=-1]
neg = a[a[:, 0]==-1]

print (pos)
# [[ 0 -1]
#  [ 0 -1]
#  [ 1 -1]
#  [ 1  0]
#  [ 0  0]
#  [ 0  1]]

print (neg)
# [[-1 -1]
#  [-1  1]
#  [-1 -1]
#  [-1  0]]

Upvotes: 2

Raphael

Reputation: 1801

import numpy as np
a = np.random.randint(-1, 2, size=(10, 2))

print(a)
[[ 0  0]
 [ 1  1]
 [ 1  1]
 [-1 -1]
 [ 0 -1]
 [ 1  1]
 [-1  1]
 [-1  0]
 [ 1 -1]
 [ 1  1]]

minus, zero, one = [np.array([r for r in a if r[0] == c]) for c in [-1, 0, 1]]


print(minus)
[[-1 -1]
 [-1  1]
 [-1  0]]
print(zero)
[[ 0  0]
 [ 0 -1]]
print(one)
[[ 1  1]
 [ 1  1]
 [ 1  1]
 [ 1 -1]
 [ 1  1]]

Upvotes: 0

Efficient grouping in numpy

Answers (3)

Related Questions