Can this loopy array process be sped up?

Question

Consider two given arrays: (in this sample, these arrays are based on n=5)

Given: array m has shape (n, 2n). When n = 5, each row of m holds a random arrangement of integers 0,0,1,1,2,2,3,3,4,4.

import numpy as np

m= np.array([[4, 2, 2, 3, 0, 1, 3, 1, 0, 4],
             [2, 4, 0, 4, 3, 2, 0, 1, 1, 3],
             [0, 2, 3, 1, 3, 4, 2, 1, 4, 0],
             [2, 1, 2, 4, 3, 0, 0, 4, 3, 1],
             [2, 0, 1, 0, 3, 4, 4, 3, 2, 1]])

Given: array t has shape (n^2, 4). When n = 5, the first two columns (m_row, val) hold all 25 permutations pairs of 0 to 4. The 1st column refers to rows of array m. The 2nd column refers to values in array m. For now, the last two columns hold dummy value 99 that will be replaced.

t = np.array([[0, 0, 99, 99],
              [0, 1, 99, 99],
              [0, 2, 99, 99],
              [0, 3, 99, 99],
              [0, 4, 99, 99],
              [1, 0, 99, 99],             
              [1, 1, 99, 99],            
              [1, 2, 99, 99],
              [1, 3, 99, 99],
              [1, 4, 99, 99],
              [2, 0, 99, 99],
              [2, 1, 99, 99],
              [2, 2, 99, 99],
              [2, 3, 99, 99],
              [2, 4, 99, 99],
              [3, 0, 99, 99],
              [3, 1, 99, 99],
              [3, 2, 99, 99],
              [3, 3, 99, 99],
              [3, 4, 99, 99],
              [4, 0, 99, 99],
              [4, 1, 99, 99],
              [4, 2, 99, 99],
              [4, 3, 99, 99],
              [4, 4, 99, 99]])

PROBLEM: I want to replace the dummy values in the last two columns of t, as follows:
Let's consider t row [1, 3, 99, 99]. So from m's row=1, I determine the indices of the two columns that hold value 3. These are columns (4,9), so the t row is updated to [1, 3, 4, 9].
In the same way, t row [4, 2, 99, 99] becomes [4, 2, 0, 8].

I currently do this by looping through each column i of array m, looking for the two instances where m[m_row, i] = val, then updating array t. (slow!) Is there a way to speed up this process, perhaps using vectorization or broadcasting?

Valdi_Bo · Accepted Answer

Use the following code:

import itertools

# First 2 columns
t = np.array(list(itertools.product(range(m.shape[0]), repeat=2)))
# Add columns - indices of "wanted" elements
t = np.hstack((t, np.apply_along_axis(lambda row, arr:
    np.nonzero(arr[row[0]] == row[1])[0], 1, t, m)))

The result, for your data sample (m array), is:

array([[0, 0, 4, 8],
       [0, 1, 5, 7],
       [0, 2, 1, 2],
       [0, 3, 3, 6],
       [0, 4, 0, 9],
       [1, 0, 2, 6],
       [1, 1, 7, 8],
       [1, 2, 0, 5],
       [1, 3, 4, 9],
       [1, 4, 1, 3],
       [2, 0, 0, 9],
       [2, 1, 3, 7],
       [2, 2, 1, 6],
       [2, 3, 2, 4],
       [2, 4, 5, 8],
       [3, 0, 5, 6],
       [3, 1, 1, 9],
       [3, 2, 0, 2],
       [3, 3, 4, 8],
       [3, 4, 3, 7],
       [4, 0, 1, 3],
       [4, 1, 2, 9],
       [4, 2, 0, 8],
       [4, 3, 4, 7],
       [4, 4, 5, 6]], dtype=int64)

Edit

The above code relies on the fact that each row in m contains just 2 "wanted" values.

To make the code resistant to the case that some row contains either too many or not enough "wanted" values (even none):

Define a function returning indices of "wanted" elements as:

def inds(row, arr):
    ind = np.nonzero(arr[row[0]] == row[1])[0]
    return np.pad(ind, (0,2), constant_values=99)[0:2]

Change the second instruction to:

t = np.hstack((t, np.apply_along_axis(inds, 1, t, m)))

To test this variant, change the first line of m to:

[4, 2, 2, 3, 5, 5, 3, 1, 5, 4]

i.e. it:

does not contain 0 elements,
contains only a single 1.

Then the initial part of the result is:

array([[ 0,  0, 99, 99],
       [ 0,  1,  7, 99],

so that the missing indices in the result are filled with 99.

Can this loopy array process be sped up?

Answers (1)

Edit

Related Questions