Ipung Purwono
Ipung Purwono

Reputation: 65

Generate random matrix in numpy without rows of all 1's

I am generating a random matrix with

np.random.randint(2, size=(5, 3))

that outputs something like

[0,1,0],
[1,0,0],
[1,1,1],
[1,0,1],
[0,0,0]

How do I create the random matrix with the condition that each row cannot contain all 1's? That is, each row can be [1,0,0] or [0,0,0] or [1,1,0] or [1,0,1] or [0,0,1] or [0,1,0] or [0,1,1] but cannot be [1,1,1].

Thanks for your answers

Upvotes: 3

Views: 502

Answers (4)

busybear
busybear

Reputation: 10580

Here's an interesting approach:

rows = np.random.randint(7, size=(6, 1), dtype=np.uint8)
np.unpackbits(rows, axis=1)[:, -3:]

Essentially, you are choosing integers 0-6 for each row, ie 000-110 as binary. 7 would be 111 (all 1's). You just need to extract binary digits as columns and take the last 3 digits (your 3 columns) since the output of unpackbits is 8 digits.

Output:

array([[1, 0, 1],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 1],
       [0, 0, 0]], dtype=uint8)

Upvotes: 7

Paul Panzer
Paul Panzer

Reputation: 53029

@busybear beat me to it but I'll post it anyway, as it is a bit more general:

def not_all(m, k):
    if k>64 or sys.byteorder != 'little':
        raise NotImplementedError
    sample = np.random.randint(0, 2**k-1, (m,), dtype='u8').view('u1').reshape(m, -1)
    sample[:, k//8] <<= -k%8                                                        
    return np.unpackbits(sample).reshape(m, -1)[:, :k]                         

For example:

>>> sample = not_all(1000000, 11)
# sanity checks
>>> unq, cnt = np.unique(sample, axis=0, return_counts=True)
>>> len(unq) == 2**11-1
True
>>> unq.sum(1).max()
10
>>> cnt.min(), cnt.max()
(403, 568)

And while I'm at hijacking other people's answers here is a streamlined version of @Nathan's acceptance-rejection method.

def accrej(m, k):
    sample = np.random.randint(0, 2, (m, k), bool)
    all_ones, = np.where(sample.all(1))
    while all_ones.size:
        resample = np.random.randint(0, 2, (all_ones.size, k), bool)
        sample[all_ones] = resample
        all_ones = all_ones[resample.all(1)]
    return sample.view('u1')

Upvotes: 1

def_init_
def_init_

Reputation: 366

Try this solution using sum():

import numpy as np

array = np.random.randint(2, size=(5, 3))
for i, entry in enumerate(array):
    if entry.sum() == 3:
        while True:
            new = np.random.randint(2, size=(1, 3))
            if new.sum() == 3:
                continue
            break
        array[i] = new

print(array)

Good luck my friend!

Upvotes: -3

Nathan
Nathan

Reputation: 10306

If you always have 3 columns, one approach is to explicitly list the possible rows and then choose randomly among them until you have enough rows:

import numpy as np

# every acceptable row
choices = np.array([
    [1,0,0],
    [0,0,0],
    [1,1,0],
    [1,0,1],
    [0,0,1],
    [0,1,0],
    [0,1,1]
])

n_rows = 5
# randomly pick which type of row to use for each row needed
idx = np.random.choice(range(len(choices)), size=n_rows)

# make an array by using the chosen rows
array = choices[idx]

If this needs to generalize to a large number of columns, it won't be practical to explicitly list all choices (even if you create the choices programmatically, the memory is still an issue; the number of possible rows grows exponentially in the number of columns). Instead, you can create an initial matrix and then just resample any unacceptable rows until there are none left. I'm assuming that a row is unacceptable if it consists only of 1s; it would be easy to adapt this to the case where the threshold is any number of 1s, though.

n_rows = 5
n_cols = 4

array = np.random.randint(2, size=(n_rows, n_cols))
all_1s_idx = array.sum(axis=-1) == n_cols
while all_1s_idx.any():
    array[all_1s_idx] = np.random.randint(2, size=(all_1s_idx.sum(), n_cols))
    all_1s_idx = array.sum(axis=-1) == n_cols

Here we just keep resampling all unacceptable rows until there are none left. Because all of the necessary rows are resampled at once, this should be quite efficient. Additionally, as the number of columns grows larger, the probability of a row having all 1s decreases exponentially, so efficiency shouldn't be a problem.

Upvotes: 4

Related Questions