Sander van den Oord
Sander van den Oord

Reputation: 12808

Select from each row randomly (weighted) a value from one of the columns

I have a large number of rows with 2 columns.
From each row I want to choose a value (with weighted probability) from either the first or the second column.

import numpy as np

values = np.array(
    [[0.41, 0.31],
     [0.73, 0.15],
     [0.44, 0.30],
     [0.67, 0.18],
])

I wanted to use random choice between 0 and 1 as an index like this with weights 0.6 for the first column and 0.4 for the second column:

probs_chosen = np.random.choice([0,1], size=4, replace=True, p=[0.6, 0.4])
print(probs_chosen)
array([0, 0, 1, 0])

But how do I use this index to select from row 1 the first value, from row 2 the first value etc.
Or any other way to solve my problem is also fine. A pandas solution is also ok.

Expected result in this case:

[0.41, 0.73, 0.30, 0.67]

Upvotes: 0

Views: 114

Answers (1)

user7864386
user7864386

Reputation:

You can use numpy advanced indexing:

row_idx = np.arange(values.shape[0])
col_idx = np.random.choice([0,1], size=4, replace=True, p=[0.6, 0.4])
out = values[row_idx, col_idx]

Output:

array([0.41, 0.73, 0.3 , 0.67])

Upvotes: 1

Related Questions