tiao
tiao

Reputation: 805

Generate random matrix with every number 0..k

Given an integer k, I am looking for a pythonic way to generate a nxm matrix (or nested list) which has every integer from 0..k-1 but no integer appears more than once in each row.

Currently I'm doing something like this

random.sample(list(combinations(xrange(k), m)), n)

but this does not guarantee every number from 0..k-1 is included, only that no integer appears more than once in each row. Also this has combinatorial complexity which is obviously undesirable.

Thanks.

Upvotes: 2

Views: 551

Answers (4)

rici
rici

Reputation: 241681

The efficiency of the following depends on the relative values of k, n and m, but if you know that n*m is much larger than k, it is probably as fast as you will get. It also has the virtues of simplicity and lack of bias:

from functools import reduce
from itertools import chain
from operator import ior
from random import sample

def gen(k,m,n):
  if m > k or k > m*n:
    raise ValueError("Unsatisfiable constraint")
  while True:
    mat = [sample(range(k), m) for i in range(n)]
    if reduce(ior, (1<<i for i in chain(*mat))) == (1<<k) - 1:
      yield mat

The generator gen repeatedly computes a list of n lists of length m where each member list is composed of unique elements of range(k) and until the n*m elements of the combined list include all elements of range(k). Once that constraint is satisfied, the successful matrix is yielded, and the cycle continues on the next iteration.

It's possible that a large number of candidate matrices will need to be generated in order to find one which satisfies the constraint. In general, large values of k and m are bad. For example, with k=10, n=4 and m=6, it's rarely necessary to generate more than two matrices, and often the first one works. However, for k=100, n=40, m=6, hundreds of matrices are thrown away for every successful one, and for k=100, n=4, m=60, it takes tens of thousands of tries to find a compliant matrix.

Upvotes: 1

MariusSiuram
MariusSiuram

Reputation: 3634

Take each number 1..k and assign it to a certain line of the matrix.

For each line of the matrix, fill the gaps without repetitions. Shuffle each line also to keep randomness.

It seems pretty straightforward and efficient.

Edit:

import random

k = 10
kset = set(range(k))
m = 4
n = 5

matrix = []
for i in range(m):
    matrix.append(set())

for i in range(k):
    matrix[random.randint(0,m-1)].add(i)

for i in range(m):
    presents = matrix[i]
    newelements = random.sample(kset-presents, n-len(presents))
    matrix[i] = random.sample( matrix[i] | set(newelements), n)

Upvotes: 0

btilly
btilly

Reputation: 46399

You want to generate a random n*m matrix of integers 1..k with every integer used, and no integer used twice in any row. And you want to do it efficiently.

If you just want to generate a reasonable answer, reasonably quickly, you can generate the rows by taking a random selection of elements, and putting them into a random order. numpy.random.random_sample and numpy.random.shuffle can do that. You will avoid the duplicate element issue. If you fail to use all of your elements, then what you can do is randomly "evolve" this towards a correct solution, At every step identify all elements that are repeated more than once in the matrix, randomly select one, and convert it to an as yet unused integer from 1..k. This will not cause duplications within rows, and will in at most k steps give you a matrix of the desired form.

Odds are that this is a good enough answer for what you want, and is what you should do. But it is imperfect - matrices of this form do not all happen with exactly equal probability. (In particular ones with lots of elements only appearing once show up slightly more than they should.) If you need a perfectly even distribution, then you're going to have to do a lot more work.

To get there you need a bit of theory. If you have that theory, then you can understand the answer as, "Do a dynamic programming solution forwards to find a count of all possible solutions, then run that backwards making random decisions to identify a random solution." Odds are that you don't have that theory.

I'm not going to give a detailed explanation of that theory. I'll just outline what you do.

You start with the trivial statement, "There are n!/(n-m)! ways in which I could have a matrix with 1 row satisfying my condition using m of the k integers, and none which use more."

For i from 1..n, for j from m to k, you figure out the count of ways in which you could build i rows using j of the k integers. You ALSO keep track of how many of those ways came from which previous values for j for the previous row. (You'll need that later.) This step can be done in a double loop.

Note that the value in the table you just generated for j=k and i=n is the number of matrices that satisfy all of your conditions. We'll build a random one from the bottom up.

First you generate a random row for the last row of your matrix - all are equally likely.

For each row until you get to the top, you use the table you built to randomly decide how many of the elements that you used in the last row you generated will never be used again. Randomly decide which those elements will be. Generate a random row from the integers that you are still using.

When you get to the top you'll have chosen a random matrix meeting your description, with no biases in how it was generated.

Upvotes: 1

morrna
morrna

Reputation: 106

Have you tried using numpy? This simple code using numpy.random.shuffle will give you a random list containing integers 0 to k exactly once:

import numpy as np
import numpy.random as npr

k = 7 # or whatever you like
thisrow = np.arange(k)
npr.shuffle(thisrow)

You can repeat this as many times as you like to get a matrix.

import numpy as np
import numpy.random as npr

k = 7 # row length
m = 23 # number of rows
mymatrix = np.zeros((m, k))

for i in range(m):
    mymatrix[i] = np.arange(k)
    npr.shuffle(mymatrix[i])

I agree with CommuSoft above that you have underspecified the relationships between k, m, and n in your question. A row that contains every integer 0..k-1 in random order exactly once has length k. Perhaps the example you gave fails to give every integer in the range because n<k?

Upvotes: 2

Related Questions