Reputation: 1

How to choose a random row from 2d np array with np array of probabilites?

I have some difficulties with choosing a random row(point in my case) from my np array. I want to do that with probabilities for each point( so I have a P_i np array in which each row is the probability for a point). I tried to do it with np.random.choice and get "it's must be a 1-D array" so I did np.random.choice on the number of the rows so I get a random index of row. But how do I do it with a probability for each point?

Upvotes: 0

Answers (1)

Whole Brain

Reputation: 2167

You can use np.choice with a probability distribution that sums up to 1.

Getting probabilities that sum up to 1

Reshaping

If your probablities already sum up to 1, then you simply want to squeeze your probability vector:

# Example of probability vector
probs = np.array([[0.1, 0.2, 0.5, 0.2]])
# array([[0.1, 0.2, 0.5, 0.2]])
probs.shape
# > (1, 4)
p_squeezed = probs.squeeze()
# > array([0.1, 0.2, 0.5, 0.2])
p_squeezed.shape
# > (4,)

Getting a proper probability distribution

If your own probs don't add up to 1, then you can apply a division by the sum or a softmax.

Just generating random data:

import numpy as np
# Random 2D points
points = np.random.randint(0,10, size=(10,2))
# random independant probabilities
probs = np.random.rand(10).reshape(-1, 1)
data = np.hstack((probs, points))
print(data)
# > array([[0.01402932, 5.        , 5.        ],
#          [0.01454579, 5.        , 6.        ],
#          [0.43927214, 1.        , 7.        ],
#          [0.36369286, 3.        , 7.        ],
#          [0.09703463, 9.        , 9.        ],
#          [0.56977406, 1.        , 4.        ],
#          [0.0453545 , 4.        , 2.        ],
#          [0.70413767, 4.        , 4.        ],
#          [0.72133774, 7.        , 1.        ],
#          [0.27297051, 3.        , 6.        ]])

Applying softmax:

from scipy.special import softmax
scale_softmax = softmax(data[:,0])
# > array([0.07077797, 0.07081454, 0.1082876 , 0.10040494, 0.07690364,
#  0.12338291, 0.0730302 , 0.14112644, 0.14357482, 0.09169694])

Applying division by the sum:

scale_divsum = data[: ,0] / data[:, 0].sum()
# > array([0.00432717, 0.00448646, 0.13548795, 0.11217647, 0.02992911,
#  0.17573962, 0.01398902, 0.21718238, 0.22248752, 0.08419431])

Here are the cumulative distributions of the scaling functions I proposed :

Softmax makes it more similarly likely to pick any point than division by the sum, but the latter probably better fits your needs.

Picking a random row

Now you can use np.random.choice and give it your probability distribution to the parameter p:

rand_idx = np.random.choice(np.arange(len(data)), p=scale_softmax)
data[rand_idx]
# > array([0.70413767, 4.        , 4.        ])

# or just the point:
data[rand_idx, 1:]
# > array([4., 4.])

Upvotes: 2