Reputation: 53
I have a list of 150 numbers from 0 to 149. I would like to use a for loop with 150 iterations in order to generate 150 lists of 6 numbers such that,t in each iteration k
, the number k
is included as well as 5 different random numbers. For example:
S0 = [0, r1, r2, r3, r4, r5] # r1, r2,..., r5 are random numbers between 0 and 150
S1 = [1, r1', r2', r3', r4', r5'] # r1', r2',..., r5' are new random numbers between 0 and 150
...
S149 = [149, r1'', r2'', r3'', r4'', r5'']
In addition, the numbers in each list have to be different and with a minimum distance of 5. This is the code I am using:
import random
import numpy as np
final_list = []
for k in range(150):
S = [k]
for it in range(5):
domain = [ele for ele in range(150) if ele not in S]
d = 0
x = k
while d < 5:
d = np.Infinity
x = random.sample(domain, 1)[0]
for ch in S:
if np.abs(ch - x) < d:
d = np.abs(ch - x)
S.append(x)
final_list.append(S)
Output:
[[0, 149, 32, 52, 39, 126],
[1, 63, 16, 50, 141, 79],
[2, 62, 21, 42, 35, 71],
...
[147, 73, 38, 115, 82, 47],
[148, 5, 78, 115, 140, 43],
[149, 36, 3, 15, 99, 23]]
Now, the code is working but I would like to know if it's possible to force that number of repetitions that each number has through all the iterations is approximately the same. For example, after using the previous code, this plot indicates how many times each number has appeared in the generated lists:
As you can see, there are numbers that have appeared more than 10 times while there are others that have appeared only 2 times. Is it possible to reduce this level of variation so that this plot can be approximated as a uniform distribution? Thanks.
Upvotes: 1
Views: 896
Reputation: 4427
You can force it to be precisely uniform, if you so desire.
Apologies for the mix of globals and locals, this seemed the most readable. You would want to rewrite according to how variable your constants are =)
import random
SIZE = 150
SAMPLES = 5
def get_samples():
pool = list(range(SIZE)) * SAMPLES
random.shuffle(pool)
items = []
for i in range(SIZE):
selection, pool = pool[:SAMPLES], pool[SAMPLES:]
item = [i] + selection
items.append(item)
return items
Then you will have exactly 5 of each (and one more in the leading position, which is a weird data structure).
>>> set(collections.Counter(vv for v in get_samples() for vv in v).values())
{6}
The method above does not guarantee the last 5 numbers are unique, in fact, you would expect ~10/150 to have a duplicate. If that is important, you need to filter your distribution a little more and decide how well you value tight uniformity, duplicates, etc.
If your numbers are approximately what you gave above, you also can patch up the results (fairly) and hope to avoid long search times (not the case for SAMPLES
sizes closer to OPTIONS
size)
def get_samples():
pool = list(range(SIZE)) * SAMPLES
random.shuffle(pool)
i = 0
while i < len(pool):
if i % SAMPLES == 0:
seen = set()
v = pool[i]
if v in seen: # swap
dst = random.choice(range(SIZE))
pool[dst], pool[i] = pool[i], pool[dst]
i = dst - dst % SAMPLES # Restart from swapped segment
else:
seen.add(v)
i += 1
items = []
for i in range(SIZE):
selection, pool = pool[:SAMPLES], pool[SAMPLES:]
assert len(set(selection)) == SAMPLES, selection
item = [i] + selection
items.append(item)
return items
This will typically take less than 5 passes through to clean up any duplicates, and should leave all arrangements satisfying your conditions equally likely.
Upvotes: 0
Reputation: 2055
First, I am not sure that your assertion that the current results are not uniformly distributed is necessarily correct. It would seem prudent to me to try and examine the histogram over several repetitions of the process, rather than just one.
I am not a statistician, but when I want to approximate uniform distribution (and assuming that the functions in random
provide uniform distribution), what I try to do is to simply accept all results returned by random
functions. For that, I need to limit the choices given to these functions ahead of calling them. This is how I would go about your task:
import random
import numpy as np
N = 150
def random_subset(n):
result = []
cands = set(range(N))
for i in range(6):
result.append(n) # Initially, n is the number that must appear in the result
cands -= set(range(n - 4, n + 5)) # Remove candidates less than 5 away
n = random.choice(list(cands)) # Select next number
return result
result = np.array([random_subset(n) for n in range(N)])
print(result)
Simply put, whenever I add a number n
to the result set, I take out of the selection candidates, an environment of the proper size, to ensure no number of a distance of less than 5 can be selected in the future.
The code is not optimized (multiple set
to list
conversions) but it works (as per my uderstanding).
Upvotes: 1