How to effeciently create conditional columns arrays using Numpy?

Question

The objective is to create an array but by fulfilling the condition of (x=>y) and (y=>z).

One naive way but does the job is by using a nested for loop as shown below

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)

a=np.zeros(shape=(1,3))
for x in list_no:
    for y in list_no:
        for z in list_no:
            if (x>=y) & (y>=z):
                a=np.append(a, [[x, y, z]], axis=0)

While no memory requirement issue was thrown, but the execution time is significantly slow.

Other approach that can be considered is by using the code code below. Yet the proposal only able to work flawlessly as long as tot_length is less than 100. More than that, memory issue arise as reported here

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)
arr = np.meshgrid ( *[list_no for _ in range ( 3 )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape

a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
    a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]

Appreciate for any suggestion that can balance the overall execution time as well as memory issue. I also welcome for any suggestion using Pandas if that should make thing work

To determine whether the proposed output produced the intended output, the following parameter

tot_length=3
steps=1
start_val=1

Should produce the output

Eric · Accepted Answer

Does something like this work?

tot_length=200
steps=0.1
list_no = np.arange(0.0, tot_length, steps)
x, y, z = np.meshgrid(*[list_no for _ in range(3)], sparse=True)
a = ((x>=y) & (y>=z)).nonzero()

This will still use 8GB of memory for the intermediate array of booleans, but avoids repeated calls to np.append which are slow.

How to effeciently create conditional columns arrays using Numpy?

Answers (2)

Related Questions