Select one random index per unique element in NumPy array and account for missing ones from reference array

Question

If I have the following

import numpy as np

mid_img = np.array([[0, 0, 1],
                    [2, 0, 2],
                    [3, 1, 0]])

values = np.array([0, 1, 2, 3, 4])              

locations = np.full((len(values), 2), [-1, -1])
locations[np.argwhere(mid_img == values)] = mid_img  # this of course doesn't work, but hopefully shows intent

'locations' would look something like this (showing only as intermediate step for explanation. Getting this output is not required.

[[[0, 0], [0, 1], [1, 1], [2, 2]],  #ie, locations matching values[0]
 [[0, 2], [2, 1]],                  #ie, locations matching values[1]
 [[1, 0], [1, 2]],                  #ie, locations matching values[2]
 [[2, 0]]]                          #ie, locations matching values[3]
 [[-1, -1]]]                        #ie, values[4] not found

The final output would then randomly select location for each value row:

print locations

Output:

[[0, 1],
 [2, 1],
 [1, 0],
 [2, 0],
 [-1, -1]

Here is a looped version of the process:

for row_index in np.arange(0, len(values)):
    found_indices = np.argwhere(mid_img == row_index)
    try:
        locations[row_index] = found_indices[np.random.randint(len(found_indices))]
    except ValueError:
        pass

Divakar · Accepted Answer

Here's one vectorized way -

# Get flattened sort indices for input array
idx = mid_img.ravel().argsort()

# Get counts for all uniqe elements
c = np.bincount(mid_img.flat)
c = c[c>0]

# Get bins to be used with searchsorted later on so that we select 
# exactly one unique index per group. These would be linear indices
bins = np.repeat(1.0/c,c).cumsum()
n = len(c)
sidx = np.searchsorted(bins,np.random.rand(n)+np.arange(n))
out_lidx = idx[sidx]

# Convert to row-col index format
row,col = np.unravel_index(out_lidx, mid_img.shape)

# Initialize output array
locations = np.full((len(values), 2), [-1, -1])

# Get valid ones based on values and indexed output
valid = values <= mid_img[row[-1],col[-1]]

# Finally assign row, col indices into final output
locations[valid,0] = row
locations[valid,1] = col

Select one random index per unique element in NumPy array and account for missing ones from reference array

Answers (1)

Related Questions