rpb
rpb

Reputation: 3299

How to effeciently create conditional columns arrays using Numpy?

The objective is to create an array but by fulfilling the condition of (x=>y) and (y=>z).

One naive way but does the job is by using a nested for loop as shown below

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)

a=np.zeros(shape=(1,3))
for x in list_no:
    for y in list_no:
        for z in list_no:
            if (x>=y) & (y>=z):
                a=np.append(a, [[x, y, z]], axis=0)

While no memory requirement issue was thrown, but the execution time is significantly slow.

Other approach that can be considered is by using the code code below. Yet the proposal only able to work flawlessly as long as tot_length is less than 100. More than that, memory issue arise as reported here

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)
arr = np.meshgrid ( *[list_no for _ in range ( 3 )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape

a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
    a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]

Appreciate for any suggestion that can balance the overall execution time as well as memory issue. I also welcome for any suggestion using Pandas if that should make thing work

To determine whether the proposed output produced the intended output, the following parameter

tot_length=3
steps=1
start_val=1

Should produce the output

1   1   1
2   1   1
2   2   1
2   2   2

Upvotes: 0

Views: 93

Answers (2)

Kate Melnykova
Kate Melnykova

Reputation: 1873

tot_length = 200
steps = 0.1
list_no = np.arange(0.0, tot_length, steps)

a = list()
for x in list_no:
    for y in list_no:
        if y > x:
            break

        for z in list_no:
            if z > y:
                break

            a.append([x, y, z])

a = np.array(a)
# if needed, a.transpose()

Upvotes: 2

Eric
Eric

Reputation: 97631

Does something like this work?

tot_length=200
steps=0.1
list_no = np.arange(0.0, tot_length, steps)
x, y, z = np.meshgrid(*[list_no for _ in range(3)], sparse=True)
a = ((x>=y) & (y>=z)).nonzero()

This will still use 8GB of memory for the intermediate array of booleans, but avoids repeated calls to np.append which are slow.

Upvotes: 1

Related Questions