Amogh Bhosekar
Amogh Bhosekar

Reputation: 181

Generate binary random matrix with upper and lower limit on number of ones in each row?

I want to generate a binary matrix of numbers with M rows and N columns. Each row must sum to <=p and >=q. In other words, each row must have at most p and at least q ones.

This is the code I have been using.

import numpy as np
def randbin(M, N, P):  
    return np.random.choice([0, 1], size=(M, N), p=[P, 1 - P])

MyMatrix = randbin(200, 7, 0.5)

Notice that row 0 is all zeros:

x

I noticed that some rows have all zeros and some rows have all ones. How can I modify this to get what I want? Is there an efficient way of achieving this solution?

Upvotes: 1

Views: 829

Answers (2)

Mad Physicist
Mad Physicist

Reputation: 114230

You can generate a random number in [q, p] for each row and then set that many random ones in each row. If by efficient you mean vectorized, then yes, there is an efficient way. The trick is to simulate sampling without replacement in one axis but with the the other. This can be done with np.argsort. You can select a variable number of indices by turning a random vector into a mask.

def randbin(m, n, p, q):
    # output to assign ones into
    result = np.zeros((m, n), dtype=bool)
    # simulate sampling with replacement in one axis
    col_ind = np.argsort(np.random.random(size=(m, n)), axis=1)
    # figure out how many samples to take in each row
    count = np.random.randint(p, q + 1, size=(m, 1))
    # turn it into a mask over col_ind using a clever broadcast
    mask = np.arange(n) < count
    # apply the mask not only to col_ind, but also the corresponding row_ind
    col_ind = col_ind[mask]
    row_ind = np.broadcast_to(np.arange(m).reshape(-1, 1), (m, n))[mask]
    # Set the corresponding elements to 1
    result[row_ind, col_ind] = 1
    return result

The selection is made so that each run of equal values in row_ind is between p and q elements long. The corresponding elements of col_ind are unique and uniformly distributed within each row.

An alternative is @Prunes solution. It requires np.argsort to shuffle the rows independently, since np.random.shuffle would keep the rows together:

def randbin(m, n, p, q):
    # make the unique rows
    options = np.arange(n) < np.arange(p, q + 1).reshape(-1, 1)
    # select random unique row to go into each output row
    selection = np.random.choice(options.shape[0], size=m, replace=True)
    # perform the selection
    result = options[selection]
    # create indices to shuffle each row independently
    col_ind = np.argsort(np.random.random(result.shape), axis=1)
    row_ind = np.arange(m).reshape(-1, 1)
    # perform the shuffle
    result = result[row_ind, col_ind]
    return result

Upvotes: 3

Prune
Prune

Reputation: 77837

Okay, then: a uniform distribution is easy enough. Let's take that case with [2,5] 1s required. Use a list of the allowable combinations:

[ [1, 1, 0, 0, 0, 0],
  [1, 1, 1, 0, 0, 0],
  [1, 1, 1, 1, 0, 0],
  [1, 1, 1, 1, 1, 0] ]

For each of your rows, choose a random element from these four, and then shuffle it. There is your row.

Upvotes: 2

Related Questions