use numpy random for fast random selection

Question

EDIT: I put real values so the code should work now:

I have 4 numpy arrays of same length:

R = np.array[ 0.39374042  0.55270474  0.50848503  0.63725071  0.0350963   0.67203288
  0.03419264  0.60936204  0.3819783   0.17653394  0.76278053  0.85589961
  0.91961392  0.85265048  0.6108294   0.15980841  0.76017363  0.21771499
  0.25927199  0.39172983  0.36364338  0.77375089  0.92969549  0.01237327
  0.12195605  0.5587532   0.70229425  0.82809111  0.06700928  0.64284712
  0.15944779  0.76857694  0.35924588  0.75636962  0.25039875  0.60632514
  0.49124143  0.73741699  0.2178207   0.15998988  0.79652839  0.73693122]

R contains np.random.uniform values between 0 and 1

RGB = np.array[[[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[  0 110 105]
  [174  40 109]
  [  5  59 158]
  [  0 181 107]]

 [[  0 161  73]
  [182  48  57]
  [174  40 109]
  [ 32 134  39]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[174  40 109]
  [  5  59 158]
  [  0 181 107]
  [193  93 160]]

 [[219  99 109]
  [174  40 109]
  [  0 181 107]
  [104 162  26]]

 [[ 63 114 221]
  [  0 172 192]
  [ 32  77 211]
  [187  77 195]]

 [[219  99 109]
  [238  67  47]
  [ 87 194  65]
  [176 187   0]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[110 213 158]
  [235 154 164]
  [  0 190 211]
  [187  77 195]]

 [[219  99 109]
  [110 213 158]
  [235 154 164]
  [193  93 160]]

 [[219  99 109]
  [110 213 158]
  [ 87 194  65]
  [  0 181 107]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[219 162 208]
  [110 213 158]
  [235 154 164]
  [167 233 196]]

 [[110 213 158]
  [235 154 164]
  [255 226 130]
  [167 233 196]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[239 212 240]
  [136 220 234]
  [208 242 247]
  [167 233 196]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]

 [[118 187 239]
  [219 162 208]
  [136 220 234]
  [  0 190 211]]]

RGB contains sets of 4 RGB color values

A = np.array[ -1  -1  -1  -1  -1  -1 159 148  -1  -1  -1  -1  45  97  57  80  -1  -1
  60  84  86  -1  -1  -1 112  72  -1  -1  -1  -1 133  -1  -1  -1  -1  -1
  -1  -1  -1  -1  -1  -1]

A contains either -1, or indices of sets of 4 values in B

B = np.array[[-1.54525517  1.09915125 -1.74258272  3.18868664]
 [-1.04522579  1.04443564 -1.83606995  2.8368601 ]
 [-2.51080789  1.4980096  -1.62047649  3.63327478]
 [-0.75381136  0.38494135 -1.76551145  3.13438146]
 [-0.42398121  1.45934623 -2.04842952  2.0130645 ]
 [-3.58516396  1.51736923 -1.40004578  4.46784052]
 [ 0.46980945  0.23242436  0.15812529  0.1396409 ]
 [ 0.02950557  0.43429909  0.34304701  0.19314833]
 [-2.44568468  0.98038089 -1.01581931  3.4811231 ]
 [ 1.38051381  0.11766208 -1.64270991  1.14453402]
 [-2.90620565  2.2622749  -1.17507929  2.81901003]
 [-2.11827269  0.28090219 -0.94480948  3.78217997]
 [ 0.07434589  0.03948412  0.45858244  0.42758754]
 [ 0.10902808  0.29206381  0.50180905  0.09709906]
 [ 0.02106152  0.62921187  0.12285574  0.22687087]
 [ 0.25688419  0.62417539  0.01976311  0.09917731]
 [-2.56431038  0.58433235 -0.32521341  3.30519143]
 [ 3.34007944 -0.24491683 -1.39262584 -0.70253677]
 [ 0.43784474  0.09927102  0.12535527  0.33752897]
 [ 0.18369437  0.15869915  0.55640207  0.10120441]
 [ 0.13323323  0.23276694  0.33810426  0.29589556]
 [ 2.31472564 -0.25736362 -0.51265688 -0.54470514]
 [-3.13602078  2.46578654  0.08271576  1.58751849]
 [-2.08295869 -0.0948967   0.37305594  2.80479945]
 [ 0.39357387  0.12289595  0.12890858  0.3546216 ]
 [ 0.3637729   0.35308756  0.03283074  0.25030881]
 [ 0.91809484  0.00616419  0.47102103 -0.39528007]
 [-2.10633552  1.9717707   0.71079464  0.42377018]
 [-2.63786465  0.31323965  1.15219987  2.17242513]
 [ 4.66105371 -0.67514766 -0.17463501 -2.81127104]
 [ 0.4466582   0.12232826  0.19249585  0.2385177 ]
 [-1.1656546   1.27760641  1.48320113 -0.59515294]
 [-2.54309788  0.61607798  1.90256384  1.02445605]
 [ 3.38699312 -0.695849    0.92595314 -2.61709726]
 [-3.3691958   2.67546554  1.66471811  0.02901215]
 [-2.01283737 -0.53906846  2.02201185  1.52989397]
 [-0.7635726   0.59671731  2.45595894 -1.28910365]
 [-1.6913111   0.68635463  2.63177913 -0.62682267]
 [ 1.67630612 -0.3755707   2.14031351 -2.44104893]
 [-2.03409447  2.03385782  2.43486791 -1.43463126]
 [-2.68827085 -0.01102552  2.97885322  0.72044315]
 [ 6.15409418 -1.17188198  1.36416304 -5.34637524]]

B contains 4 values that add to 1

What I want to do is to create a new array N that will contain RGB values that are chosen with a method similar to this one, but using nympy to avoid the time-consuming loop.

Code example with the loop:

new_array = []

for i in range(len(A)):
    if A[i] != -1:
        a = B[i][0]
        b = B[i][0] + a
        c = B[i][0] + b

        u = RGB[i][0]
        v = RGB[i][0]
        w = RGB[i][0]
        x = RGB[i][0]

        random = R[i]

        if a <= random:
             new_array = new_array + [u]
        elif b <= random:
             new_array = new_array + [v]
        elif c <= random:
             new_array = new_array + [w]
        else:
             new_array = new_array + [x]

    else:
        new_array = new_array + [0 0 0]

Is there a way to rewrite this function entirely in numpy? Thanks

DrV · Accepted Answer

I'll try to paraphrase the problem.

input array A (N integers valued 0..M-1 or -1) pointing to probability vectors B
input array B (M x 4) giving probabilities (each row sums up to 1)
colour table RGB (N x 4 x 3) giving RGB triplets to choose from
input vector R (N) containing uniformly distributed random values [0,1]

So, for each value of A:

the probability vector is picked from B
the random value is use to choose which of the four alternatives will be picked form the same row in RGB

In addition, there are two extra rules:

if A[n]==-1, then the corresponding output colour is black
if there are negative probabilities in B[n], then the corresponding output colour is black

The output will be a Nx3 colour array.

So, let us first construct a clean probability vector so that impossible combinations are represented by [-1,0,0,0].

# get the number of rows:
N = len(A)

# create a boolean array to show which indices in A are valid
A_valid = (A != -1)

# get B vectors for all valid points in A
B_vectors = B[A[A_valid]]

# clean the B_vectors so that if there are <0 vectors, they are replaced by -1,0,0,0
B_vectors[numpy.amin(B_vectors, axis=1) < 0] = [-1.0, 0.0, 0.0, 0.0]

# create a clean probability table (N x 4)
probs = numpy.empty((N, 4))
# fill in the probabilities where they can be picked form B
probs[A_valid] = B_vectors
# fill the rest with -1,0,0,0
probs[-A_valid] = [-1, 0, 0, 0]

Now we have a table with either real probabilities (positive numbers summing to 1) or (-1,0,0,0) in case there is -1 in A or an impossible probability vector in B on the specific row.

The probability vectors are easier to use, if a cumulative probability is formed. For example, probability vector (.2, .3, .4, .1) is transformed into (.2, .5, .9, 1.0). In this form the random number r can be compared directly to see which bin should be chosen.

The next step is to obtain the colour bins (0,1,2,3) by using this approach:

# cumulative probabilities
cumprobs = numpy.cumsum(probs, axis=1)

# color indices
cidx = numpy.zeros(N)

# compare the colour indices to the random vector r
cidx[r > cumprobs[:,0]] = 1
cidx[r > cumprobs[:,1]] = 2
cidx[r > cumprobs[:,2]] = 3

(For some strange reason, there is no function in numpy to perform this. numpy.digitize works only with 1-d vectors.)

It should be noted that if for some row the cumulative probabilities are (.2, .5, .9, 1.0) and r for the same row is 0.95, cidx is first 0 (after the array creation), then set to 1 (because r>.2), then to 2 (because r>.5) and finally to 3 (because r>.9).

Then we can create the output colour table by using cidx and RGB:

# pick the item defined by cidx for each row
rainbow = RGB[arange(N), cidx]

This picks the colour specified by the corresponding cidx and RGB values on that row.

Finally, we have to blacken out all invalid colours:

# if the probability starts with -1, then we'll blacken the color out
rainbow[probs[:,0] < 0.] = [0,0,0]

Now the result should be in rainbow.

use numpy random for fast random selection

Answers (1)

Related Questions