Reputation: 741
I have an array val
of possible values (ex. val = [0, 1, 2, 3, 4, 5]
) and an array A
(possibly very long list) of selected values (ex. A = [2, 3, 1, 0, 2, 1, ... , 2, 3, 1, 0, 4]
)
Now I want to create an array B
of the same length as A
such that A[i]
is different than B[i]
for each i
and entries in B
are selected randomly. How to do it efficiently using numpy?
Upvotes: 2
Views: 823
Reputation: 53029
A simple method would be drawing the difference between A and B modulo n where n is the number of possible outcomes. A[i] != B[i] means that this difference is not zero, hence we draw from 1,...,n-1:
n,N = 10,100
A = np.random.randint(0,n,N)
D = np.random.randint(1,n,N)
B = (A-D)%n
Update: while arguably elegant this solution is not the fastest. We could save some time by replacing the (slow) modulo operator with just testing for negative values and adding n to them.
In this form this solution starts looking quite similar to @Divakar's: two blocks of possible values, one needs to be shifted.
But we can do better: instead of shifting on average half the values we can instead swap them out only if A[i] == B[i]. As this is expected to happen rarely unless the list of permissible values is very short, the code runs faster:
B = np.random.randint(1,n,N)
B[B==A] = 0
Upvotes: 1
Reputation: 221514
Here's one vectorized way -
def randnum_excludeone(A, val):
n = val[-1]
idx = np.random.randint(0,n,len(A))
idx[idx>=A] += 1
return idx
The idea is we generate random integers for each entry in A
covering the entire length of val
minus 1
. Then, we add in 1
if the current random number generated is same or greater than current A
element, otherwise we keep it. Thus, for any random number generated that's lesser than current A
number, we keep it. Otherwise, with 1
addition, we will offset from the current A
number. That's our final output - idx
.
Let's verify the random-ness and make sure it's uniform across non-A elements -
In [42]: A
Out[42]: array([2, 3, 1, 0, 2, 1, 2, 3, 1, 0, 4])
In [43]: val
Out[43]: array([0, 1, 2, 3, 4, 5])
In [44]: c = np.array([randnum_excludeone(A, val) for _ in range(10000)])
In [45]: [np.bincount(i) for i in c.T]
Out[45]:
[array([2013, 2018, 0, 2056, 1933, 1980]),
array([2018, 1985, 2066, 0, 1922, 2009]),
array([2032, 0, 1966, 1975, 2040, 1987]),
array([ 0, 2076, 1986, 1931, 2013, 1994]),
array([2029, 1943, 0, 1960, 2100, 1968]),
array([2028, 0, 2048, 2031, 1929, 1964]),
array([2046, 2065, 0, 1990, 1940, 1959]),
array([2040, 2003, 1935, 0, 2045, 1977]),
array([2008, 0, 2011, 2030, 1937, 2014]),
array([ 0, 2000, 2015, 1983, 2023, 1979]),
array([2075, 1995, 1987, 1948, 0, 1995])]
Benchmarking on large arrays
Other vectorized approach(es) :
# @Paul Panzer's solution
def pp(A, val):
n,N = val[-1]+1,len(A)
D = np.random.randint(1,n,N)
B = (A-D)%n
return B
Timing results -
In [66]: np.random.seed(0)
...: A = np.random.randint(0,6,100000)
In [67]: %timeit pp(A,val)
100 loops, best of 3: 3.11 ms per loop
In [68]: %timeit randnum_excludeone(A, val)
100 loops, best of 3: 2.53 ms per loop
In [69]: np.random.seed(0)
...: A = np.random.randint(0,6,1000000)
In [70]: %timeit pp(A,val)
10 loops, best of 3: 39.9 ms per loop
In [71]: %timeit randnum_excludeone(A, val)
10 loops, best of 3: 25.9 ms per loop
Extending the range of val
to 10
-
In [60]: np.random.seed(0)
...: A = np.random.randint(0,10,1000000)
In [61]: %timeit pp(A,val)
10 loops, best of 3: 31.2 ms per loop
In [62]: %timeit randnum_excludeone(A, val)
10 loops, best of 3: 23.6 ms per loop
Upvotes: 1
Reputation: 80289
Here is another approach. B
first gets a random shuffle of A
. Then, all the values where A
and B
overlap get shuffled. In the special case where all the overlapping elements have the same value, they get swapped with random good values.
Interesting on this approach is that it also works when there A
only contains a very limited set of different values. Unlike other approaches, B
is an exact shuffle of A
, so it also works when A
doesn't have a uniform distribution. Also, B
is a completely random shuffle except for the requirement of being different at equal indices.
import random
N = 10000
A = [random.randrange(0,6) for _ in range(N)]
B = a.copy()
random.shuffle(b)
print(A)
print(B)
while True:
equal_vals = {i for i,j in zip(A, B) if i == j}
print(len(equal_vals), equal_vals)
if len(equal_vals) == 0: # finished, no equal values on same positions
break
else:
equal_ind = [k for k, (i, j) in enumerate(zip(A, B)) if i == j]
# create a list of indices where A and B are equal
random.shuffle(equal_ind) # as the list was ordened, shuffle it to get a random order
if len(equal_vals) == 1: # special case, all equal indices have the same value
special_val = equal_vals.pop()
# find all the indices where the special_val could be placed without problems
good_ind = [k for k,(i,j) in enumerate(zip(A, B)) if i != special_val and j != special_val]
if len(good_ind) < len(equal_ind):
print("problem: there are too many equal values in list A")
else:
# swap each bad index with a random good index
chosen_ind = random.sample(good_ind, len(equal_ind))
for k1, k2 in zip(equal_ind, chosen_ind):
b[k1], b[k2] = b[k2], b[k1] # swap
break
elif len(equal_vals) >= 2:
# permute B via the lis of equal indices;
# as there are at least 2 different values, at least two indices will get a desired value
prev = equal_ind[0]
old_first = B[prev]
for k in equal_ind[1:]:
B[prev] = B[k]
prev = k
B[prev] = old_first
print(A)
print(B)
Upvotes: 0
Reputation: 1
Quick and dirty, and improvements could be made, but here goes. Your requirements can be accomplished as follows:
val = [0, 1, 2, 3, 4, 5]
A = [2, 3, 1, 0, 2, 1,4,4, 2, 3, 1, 0, 4]
val_shifted = np.roll(val,1)
dic_val = {i:val_shifted[i] for i in range(len(val_shifted))}
B = [dic_val[i] for i in A]
Which Gives the result that meets your requirement
A = [2, 3, 1, 0, 2, 1, 4, 4, 2, 3, 1, 0, 4]
B = [1, 2, 0, 5, 1, 0, 3, 3, 1, 2, 0, 5, 3]
Upvotes: 0
Reputation: 15204
This is somewhat wasteful as it creates a temporary list for every item in A
but otherwise fullfills your requirements:
from random import choice
val = [0, 1, 2, 3, 4, 5]
A = [2, 3, 1, 0, 2, 1, 2, 3, 1, 0, 4]
val = set(val)
B = [choice(list(val - {x})) for x in A]
print(B) # -> [4, 2, 3, 2, 5, 4, 1, 5, 5, 4, 1]
In a nutshell:
What happens is that val
is converted to a set
from which the current item in A
gets removed. Consequently, an item is chosen at random from this resulting subset and gets added to B
.
You can also test it with:
print(all(x!=y for x, y in zip(A, B)))
which of course returns True
Finally, note that the approach above only works with hashable items. So if you might have something like val = [[1, 2], [2, 3], ..]
for example you will run into problems.
Upvotes: 1