Vectorizing this non-unique-key operation

Question

I have a non-unique original data called test. Using this input, I want to create an output vector together with a set of rows that get non-zero output, and the data, that contains their output.

import numpy as np

rows = np.array([3, 4])
test = np.array([1, 3, 3, 4, 5])
data = np.array([-1, 2])

My expected output is a vector of shape test.shape.

Each element in output:

if element is in rows with index i, output[i] = data[i]
otherwise, output[i] = 0

In other words, the following generates my output.

output = np.zeros(test.shape)
for i, val in enumerate(rows):
    output[test == val] = data[i]

Is there any way of vectorizing this?

Divakar · Accepted Answer

Here's a vectorized approach based upon searchsorted -

# Get sorted index positions
idx = np.searchsorted(rows, test)

# Set out-of-bounds(invalid ones) to some dummy index, say 0
idx[idx==len(rows)] = 0

# Get invalid mask array found out by indexing data array
# with those indices and looking for matches
invalid_mask = rows[idx] != test

# Get data indexed array as output and set invalid places with 0s
out = data[idx]
out[invalid_mask] = 0

Last couple of lines could have two alternatives, if you dig one-liners -

out = data[idx] * (rows[idx] == test) # skips using `invalid_mask`

out = np.where(invalid_mask, 0, data[idx])

Vectorizing this non-unique-key operation

Answers (2)

Related Questions