Simd
Simd

Reputation: 21233

An elegant way to make a 2d array with all possible columns

In numpy I would like to make a 2d arrray (r, by 2**r) where the columns are all possible binary columns.

For example, if height of the columns is 5, the columns would be

[0,0,0,0,0],  [0,0,0,0,1], [0,0,0,1,0], [0,0,0,1,1], [0,0,1,0,0], ...

My solution is

 np.array(list(itertools.product([0,1],repeat = c))).T

This seems very ugly. Is there a more elegant way?

Upvotes: 7

Views: 259

Answers (5)

woryzower
woryzower

Reputation: 976

What about this

   np.array([list("0"*(r-1-int(np.log2(i)))+"{0:b}".format(i))  if i>0 else [i]*r for i in range(0,2**r)]).astype(int)

EDIT

I noticed that it is unnecessary to use log2 calculation in a formatting solution:

frmt = "{0:0"+str(r)+"b}"
print [map(int,list(frmt.format(i))) for i in range(0,2**r)]

And the time difference ..

In [17]: timeit [map(int,list(frmt.format(i))) for i in range(0,2**r)]
10000 loops, best of 3: 171 µs per loop

In [18]: timeit np.array([list("0"*(r-1-int(np.log2(i)))+"{0:b}".format(i)) if i>0 else [i]*r for i in range(0,2**r)]).astype(int)
1000 loops, best of 3: 514 µs per loop

Upvotes: 1

hpaulj
hpaulj

Reputation: 231385

itertools.product is a fast C code; np.array is slower general purpose code. np.fromiter can, in the right situations, be faster. But fromiter requires a flat input, not nested lists. But another itertools does a good job of flattening lists.

Here are some iteresting time comparisons:

In [72]: timeit list(product([0,1],repeat=5))
100000 loops, best of 3: 7.03 us per loop

In [73]: timeit list(chain(*product([0,1],repeat=5)))
100000 loops, best of 3: 18.3 us per loop

In [74]: timeit np.fromiter(chain(*product([0,1],repeat=5)),int,count=160).reshape(-1,5)
10000 loops, best of 3: 33.8 us per loop

In [75]: timeit np.array(list(product([0,1],repeat=5)))
10000 loops, best of 3: 65.1 us per loop

In this case including a count win fromiter doesn't make much difference.

There is a certain Pythonic elegance in chaining generators.


By way of comparison, my time for the pure numpy method is a bit slower:

In [85]: timeit (((np.arange(2**5)[:,None] & 2**np.arange(5)[::-1]))>0).astype(int)
10000 loops, best of 3: 38.1 us per loop

But @Divakar shows this solution scales much better.

Upvotes: 1

UlfR
UlfR

Reputation: 4395

Its not numpy but fairly elegant:

r = 5
boolcols = [[y&1<<x and 1 for x in range(r)[::-1]] for y in range(2**r)]

Some clarifications of the central part could be useful:

y&1<<x and 1

This is equivalent to

(y&(1<<x)) and 1

Lets pretend x=3 and y=5 we then get:

  1. 1<<x is 1<<3 which in binary notation is 1000. A binary one is shifted 3 steps to the left. In python you could write 0b1000 or just 8. Read more
  2. y&(1<<x) is now 5&8 which is a bitwise 'AND' of 0101 and 1000 which is 0
  3. Now we are left with (0) and 1 which yields 0

NOTE: After using timeit to measure the performance of this against the other solutions, this is not the one you should select to use! Its slower in every tests I have made. Ex:

In [21]: r = 15

In [23]: %timeit [[y&1<<x and 1 for x in range(r)[::-1]] for y in range(2**r)]
10 loops, best of 3: 39.8 ms per loop

In [30]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)
1000 loops, best of 3: 1.31 ms per loop

Upvotes: 3

Divakar
Divakar

Reputation: 221574

You can use some broadcasting here, like so -

(((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)

For r between 0 and 8, you can also use np.unpackbits -

np.unpackbits(np.arange(2**r,dtype='uint8')[:,None], axis=1)[:,8-r:]

Runtime tests -

Case #1 (Original r = 5):

In [217]: r = 5

In [218]: from itertools import product

In [219]: %timeit np.array(list(product([0,1], repeat=5)))
10000 loops, best of 3: 33.9 µs per loop

In [220]: %timeit np.unpackbits(np.arange(2**r,dtype='uint8')[:,None], axis=1)[:,8-r:]
100000 loops, best of 3: 10.6 µs per loop

In [221]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int) 
10000 loops, best of 3: 31.1 µs per loop

Case #2 (Larger r):

In [242]: r = 15

In [243]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)
100 loops, best of 3: 6.6 ms per loop

In [244]: %timeit np.array(list(product([0,1], repeat=r)))
10 loops, best of 3: 77.5 ms per loop

Upvotes: 4

Martin Evans
Martin Evans

Reputation: 46759

Another non-numpy solution:

r = 3
print [map(int, list('{:0{w}b}'.format(x, w=r))) for x in xrange(2**r)]

Giving:

[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]

Upvotes: 3

Related Questions