Random valid data items in numpy array

Question

Suppose I have a numpy array as follows:

data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])

I would like to randomly select n-valid items from the array, including their indices.

Does numpy provide an efficient way of doing this?

Paul Panzer · Accepted Answer

Example

data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5

Get valid indices

y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size

Pick random subset of size n by index

pick = np.random.choice(n_val, n)

Apply index to valid coordinates

y_pick, x_pick = y_val[pick], x_val[pick]

Get corresponding data

data_pick = data[y_pick, x_pick]

Admire

data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])

Answers (2)