Fallen Apart
Fallen Apart

Reputation: 741

How to create a numpy array with a random entries that exclude one element in each index?

I have an array val of possible values (ex. val = [0, 1, 2, 3, 4, 5]) and an array A (possibly very long list) of selected values (ex. A = [2, 3, 1, 0, 2, 1, ... , 2, 3, 1, 0, 4])

Now I want to create an array B of the same length as A such that A[i] is different than B[i] for each i and entries in B are selected randomly. How to do it efficiently using numpy?

Upvotes: 2

Views: 823

Answers (5)

Paul Panzer
Paul Panzer

Reputation: 53029

A simple method would be drawing the difference between A and B modulo n where n is the number of possible outcomes. A[i] != B[i] means that this difference is not zero, hence we draw from 1,...,n-1:

n,N = 10,100
A = np.random.randint(0,n,N)

D = np.random.randint(1,n,N)
B = (A-D)%n

Update: while arguably elegant this solution is not the fastest. We could save some time by replacing the (slow) modulo operator with just testing for negative values and adding n to them.

In this form this solution starts looking quite similar to @Divakar's: two blocks of possible values, one needs to be shifted.

But we can do better: instead of shifting on average half the values we can instead swap them out only if A[i] == B[i]. As this is expected to happen rarely unless the list of permissible values is very short, the code runs faster:

B = np.random.randint(1,n,N)
B[B==A] = 0

Upvotes: 1

Divakar
Divakar

Reputation: 221514

Here's one vectorized way -

def randnum_excludeone(A, val):
    n = val[-1]
    idx = np.random.randint(0,n,len(A))
    idx[idx>=A] += 1
    return idx

The idea is we generate random integers for each entry in A covering the entire length of val minus 1. Then, we add in 1 if the current random number generated is same or greater than current A element, otherwise we keep it. Thus, for any random number generated that's lesser than current A number, we keep it. Otherwise, with 1 addition, we will offset from the current A number. That's our final output - idx.

Let's verify the random-ness and make sure it's uniform across non-A elements -

In [42]: A
Out[42]: array([2, 3, 1, 0, 2, 1, 2, 3, 1, 0, 4])

In [43]: val
Out[43]: array([0, 1, 2, 3, 4, 5])

In [44]: c = np.array([randnum_excludeone(A, val) for _ in range(10000)])

In [45]: [np.bincount(i) for i in c.T]
Out[45]: 
[array([2013, 2018,    0, 2056, 1933, 1980]),
 array([2018, 1985, 2066,    0, 1922, 2009]),
 array([2032,    0, 1966, 1975, 2040, 1987]),
 array([   0, 2076, 1986, 1931, 2013, 1994]),
 array([2029, 1943,    0, 1960, 2100, 1968]),
 array([2028,    0, 2048, 2031, 1929, 1964]),
 array([2046, 2065,    0, 1990, 1940, 1959]),
 array([2040, 2003, 1935,    0, 2045, 1977]),
 array([2008,    0, 2011, 2030, 1937, 2014]),
 array([   0, 2000, 2015, 1983, 2023, 1979]),
 array([2075, 1995, 1987, 1948,    0, 1995])]

Benchmarking on large arrays

Other vectorized approach(es) :

# @Paul Panzer's solution
def pp(A, val):
    n,N = val[-1]+1,len(A)    
    D = np.random.randint(1,n,N)
    B = (A-D)%n
    return B

Timing results -

In [66]: np.random.seed(0)
    ...: A = np.random.randint(0,6,100000)

In [67]: %timeit pp(A,val)
100 loops, best of 3: 3.11 ms per loop

In [68]: %timeit randnum_excludeone(A, val)
100 loops, best of 3: 2.53 ms per loop

In [69]: np.random.seed(0)
    ...: A = np.random.randint(0,6,1000000)

In [70]: %timeit pp(A,val)
10 loops, best of 3: 39.9 ms per loop

In [71]: %timeit randnum_excludeone(A, val)
10 loops, best of 3: 25.9 ms per loop

Extending the range of val to 10 -

In [60]: np.random.seed(0)
    ...: A = np.random.randint(0,10,1000000)

In [61]: %timeit pp(A,val)
10 loops, best of 3: 31.2 ms per loop

In [62]: %timeit randnum_excludeone(A, val)
10 loops, best of 3: 23.6 ms per loop

Upvotes: 1

JohanC
JohanC

Reputation: 80289

Here is another approach. B first gets a random shuffle of A. Then, all the values where A and B overlap get shuffled. In the special case where all the overlapping elements have the same value, they get swapped with random good values.

Interesting on this approach is that it also works when there A only contains a very limited set of different values. Unlike other approaches, Bis an exact shuffle of A, so it also works when A doesn't have a uniform distribution. Also, B is a completely random shuffle except for the requirement of being different at equal indices.

import random
N = 10000
A = [random.randrange(0,6) for _ in range(N)]
B = a.copy()
random.shuffle(b)
print(A)
print(B)

while True:
    equal_vals = {i for i,j in zip(A, B) if i == j}
    print(len(equal_vals), equal_vals)
    if len(equal_vals) == 0:  # finished, no equal values on same positions
        break
    else:
        equal_ind = [k for k, (i, j) in enumerate(zip(A, B)) if i == j]
        # create a list of indices where A and B are equal
        random.shuffle(equal_ind)  # as the list was ordened, shuffle it to get a random order

        if len(equal_vals) == 1:  # special case, all equal indices have the same value
            special_val = equal_vals.pop()
            # find all the indices where the special_val could be placed without problems
            good_ind = [k for k,(i,j) in enumerate(zip(A, B)) if i != special_val and j != special_val]
            if len(good_ind) < len(equal_ind):
                print("problem: there are too many equal values in list A")
            else:
                # swap each bad index with a random good index
                chosen_ind = random.sample(good_ind, len(equal_ind))
                for k1, k2 in zip(equal_ind, chosen_ind):
                    b[k1], b[k2] = b[k2], b[k1]  # swap
            break
        elif len(equal_vals) >= 2:

            # permute B via the lis of equal indices;
            #   as there are at least 2 different values, at least two indices will get a desired value
            prev = equal_ind[0]
            old_first = B[prev]
            for k in equal_ind[1:]:
                B[prev] = B[k]
                prev = k
            B[prev] = old_first

print(A)
print(B)

Upvotes: 0

vishnu sivadasan
vishnu sivadasan

Reputation: 1

Quick and dirty, and improvements could be made, but here goes. Your requirements can be accomplished as follows:

val = [0, 1, 2, 3, 4, 5]
A = [2, 3, 1, 0, 2, 1,4,4, 2, 3, 1, 0, 4]
val_shifted = np.roll(val,1) 
dic_val = {i:val_shifted[i] for i in range(len(val_shifted))}
B = [dic_val[i] for i in A]

Which Gives the result that meets your requirement

A = [2, 3, 1, 0, 2, 1, 4, 4, 2, 3, 1, 0, 4]
B = [1, 2, 0, 5, 1, 0, 3, 3, 1, 2, 0, 5, 3]

Upvotes: 0

Ma0
Ma0

Reputation: 15204

This is somewhat wasteful as it creates a temporary list for every item in A but otherwise fullfills your requirements:

from random import choice


val = [0, 1, 2, 3, 4, 5]
A = [2, 3, 1, 0, 2, 1, 2, 3, 1, 0, 4]

val = set(val)
B = [choice(list(val - {x})) for x in A]
print(B) # -> [4, 2, 3, 2, 5, 4, 1, 5, 5, 4, 1]

In a nutshell:

What happens is that val is converted to a set from which the current item in A gets removed. Consequently, an item is chosen at random from this resulting subset and gets added to B.


You can also test it with:

print(all(x!=y for x, y in zip(A, B)))

which of course returns True


Finally, note that the approach above only works with hashable items. So if you might have something like val = [[1, 2], [2, 3], ..] for example you will run into problems.

Upvotes: 1

Related Questions