Vector generated numpy column with values under condition

Question

Suppose I have a numpy array generated like so:

np.random.seed(1)
arr = np.random.randint(10,size=(5,2))

which produces the following:

array([[5, 8],
       [9, 5],
       [0, 0],
       [1, 7],
       [6, 9]])

How do I:

add randomly a new column with its own range making sure that its values DO NOT match any values in the other columns? E.g. simply appending another array would not work because it would not guarantee not having duplicate values.

The following would be illegal because as we see in the first row, the third column is 8 where the second column is also 8:

np.append(arr, np.random.randint(10,size=(5,1)), axis=1)

array([[5, 8, 8],
   [9, 5, 8],
   [0, 0, 6],
   [1, 7, 2],
   [6, 9, 8]])

A sub-question:

how do I generate another column with values that are distinct from select columns, e.g. it is OK for the value to be equal to the second column, but it is prohibited to match the first column.

I understand that this can be done using standard for loops, but this would dramatically decrease the performance if we are talking about millions of rows, so I am looking for a vectorized solution.

kuzand · Accepted Answer

Here's one way to do it;

import random

def my_randint(a, select_columns, n):
    integers = list(set(range(n)).difference(a[select_columns]))
    return random.choice(integers)

new_col = np.apply_along_axis(my_randint, axis=1, arr=arr, select_columns=[0, 1], n=10) 
new_arr = np.hstack([arr, new_col[:,None]])

Note that I use random.choice instead of np.random.choice because it is faster.

Vector generated numpy column with values under condition

Answers (1)

Related Questions