Reputation: 21233
In numpy I would like to make a 2d arrray (r, by 2**r) where the columns are all possible binary columns.
For example, if height of the columns is 5, the columns would be
[0,0,0,0,0], [0,0,0,0,1], [0,0,0,1,0], [0,0,0,1,1], [0,0,1,0,0], ...
My solution is
np.array(list(itertools.product([0,1],repeat = c))).T
This seems very ugly. Is there a more elegant way?
Upvotes: 7
Views: 259
Reputation: 976
What about this
np.array([list("0"*(r-1-int(np.log2(i)))+"{0:b}".format(i)) if i>0 else [i]*r for i in range(0,2**r)]).astype(int)
EDIT
I noticed that it is unnecessary to use log2 calculation in a formatting solution:
frmt = "{0:0"+str(r)+"b}"
print [map(int,list(frmt.format(i))) for i in range(0,2**r)]
And the time difference ..
In [17]: timeit [map(int,list(frmt.format(i))) for i in range(0,2**r)]
10000 loops, best of 3: 171 µs per loop
In [18]: timeit np.array([list("0"*(r-1-int(np.log2(i)))+"{0:b}".format(i)) if i>0 else [i]*r for i in range(0,2**r)]).astype(int)
1000 loops, best of 3: 514 µs per loop
Upvotes: 1
Reputation: 231385
itertools.product
is a fast C code; np.array
is slower general purpose code. np.fromiter
can, in the right situations, be faster. But fromiter
requires a flat input, not nested lists. But another itertools
does a good job of flattening lists.
Here are some iteresting time comparisons:
In [72]: timeit list(product([0,1],repeat=5))
100000 loops, best of 3: 7.03 us per loop
In [73]: timeit list(chain(*product([0,1],repeat=5)))
100000 loops, best of 3: 18.3 us per loop
In [74]: timeit np.fromiter(chain(*product([0,1],repeat=5)),int,count=160).reshape(-1,5)
10000 loops, best of 3: 33.8 us per loop
In [75]: timeit np.array(list(product([0,1],repeat=5)))
10000 loops, best of 3: 65.1 us per loop
In this case including a count
win fromiter
doesn't make much difference.
There is a certain Pythonic elegance in chaining generators.
By way of comparison, my time for the pure numpy method is a bit slower:
In [85]: timeit (((np.arange(2**5)[:,None] & 2**np.arange(5)[::-1]))>0).astype(int)
10000 loops, best of 3: 38.1 us per loop
But @Divakar shows this solution scales much better.
Upvotes: 1
Reputation: 4395
Its not numpy but fairly elegant:
r = 5
boolcols = [[y&1<<x and 1 for x in range(r)[::-1]] for y in range(2**r)]
Some clarifications of the central part could be useful:
y&1<<x and 1
This is equivalent to
(y&(1<<x)) and 1
Lets pretend x=3 and y=5 we then get:
1<<x
is 1<<3
which in binary notation is 1000
. A binary one is shifted 3 steps to the left. In python you could write 0b1000
or just 8
. Read morey&(1<<x)
is now 5&8
which is a bitwise 'AND' of 0101
and 1000
which is 0
(0) and 1
which yields 0
NOTE: After using timeit to measure the performance of this against the other solutions, this is not the one you should select to use! Its slower in every tests I have made. Ex:
In [21]: r = 15
In [23]: %timeit [[y&1<<x and 1 for x in range(r)[::-1]] for y in range(2**r)]
10 loops, best of 3: 39.8 ms per loop
In [30]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)
1000 loops, best of 3: 1.31 ms per loop
Upvotes: 3
Reputation: 221574
You can use some broadcasting
here, like so -
(((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)
For r
between 0
and 8
, you can also use np.unpackbits
-
np.unpackbits(np.arange(2**r,dtype='uint8')[:,None], axis=1)[:,8-r:]
Runtime tests -
Case #1 (Original r = 5
):
In [217]: r = 5
In [218]: from itertools import product
In [219]: %timeit np.array(list(product([0,1], repeat=5)))
10000 loops, best of 3: 33.9 µs per loop
In [220]: %timeit np.unpackbits(np.arange(2**r,dtype='uint8')[:,None], axis=1)[:,8-r:]
100000 loops, best of 3: 10.6 µs per loop
In [221]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)
10000 loops, best of 3: 31.1 µs per loop
Case #2 (Larger r
):
In [242]: r = 15
In [243]: %timeit (((np.arange(2**r)[:,None] & 2**np.arange(r)[::-1]))>0).astype(int)
100 loops, best of 3: 6.6 ms per loop
In [244]: %timeit np.array(list(product([0,1], repeat=r)))
10 loops, best of 3: 77.5 ms per loop
Upvotes: 4
Reputation: 46759
Another non-numpy solution:
r = 3
print [map(int, list('{:0{w}b}'.format(x, w=r))) for x in xrange(2**r)]
Giving:
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]
Upvotes: 3