Reputation: 13485
I have 2 array shapes which are broadcastable against eachother.
e.g. (2, 2, 1) and (2, 3)
I want a function that takes those shapes and gives me an iterator returning the indices from these arrays with these shapes that would be broadcast together, and the indices in the resulting output array.
iter, output_shape = broadcast_indeces_iterator((2, 2, 1), (2, 3))
assert output_shape == (2, 2, 3)
for in1_ix, in_2_ix, out_ix in iter:
print (in1_ix, in_2_ix, out_ix)
results in output:
(0, 0, 0), (0, 0), (0, 0, 0)
(0, 0, 0), (0, 1), (0, 0, 1)
(0, 0, 0), (0, 2), (0, 0, 2)
(0, 1, 0), (1, 0), (0, 1, 0)
(0, 1, 0), (1, 1), (0, 1, 1)
(0, 1, 0), (1, 2), (0, 1, 2)
(1, 0, 0), (0, 0), (1, 0, 0)
(1, 0, 0), (0, 1), (1, 0, 1)
(1, 0, 0), (0, 2), (1, 0, 2)
(1, 1, 0), (1, 0), (1, 1, 0)
(1, 1, 0), (1, 1), (1, 1, 1)
(1, 1, 0), (1, 2), (1, 1, 2)
np.broadcast does something close but wants actual created arrays.
Upvotes: 2
Views: 427
Reputation: 231385
Here's a start:
array1 = np.arange(4).reshape(2,2,1)*10
array2 = np.arange(6).reshape(2,3)
I, J = np.broadcast_arrays(array1, array2)
print I.shape
K = np.empty(I.shape, dtype=int)
for ijk in np.ndindex(I.shape):
K[ijk] = I[ijk]+J[ijk]
print K
producing
(2, 2, 3) # broadcasted shape
[[[ 0 1 2]
[13 14 15]]
[[20 21 22]
[33 34 35]]]
I
is (2,2,3)
, but shares its data with array1
- it's a broadcasted view, not a copy (look at its .__array_interface__
).
You can iterate over just 2 dimensions by giving ndindex
just those shapes.
K = np.empty(I.shape, dtype=int)
for i,j in np.ndindex(I.shape[:2]):
K[i,j,:] = I[i,j,:]+J[i,j,:]
print K[i,j,:]
It can be refined by looking at the code for broadcast_arrays
and ndindex
to find the essential pieces. For example in https://stackoverflow.com/a/25097271/901925 I call nditer
directly to generate a multi_index
(an operation which could be adapted to cython).
xx = np.zeros(y.shape[:2])
it = np.nditer(xx,flags=['multi_index'])
while not it.finished:
print y[it.multi_index],
it.iternext()
# [242 14 211] [198 7 0] [235 60 81] [164 64 236]
To make 'dummy arrays' that are virtually costless, I could take a clue from ndindex
, and make array starting with np.zeros(1)
def make_dummy(shape):
x = as_strided(np.zeros(1),shape=shape, strides=np.zeros_like(shape))
return x
array1 = make_dummy((2,2,1))
array2 = make_dummy((2,3))
I could dig into np.broadcast_arrays
to find out how it combines the shapes from the 2 input arrays to come up with the shape for I
.
There's a difference between your desired solution and mine that I've glossed over.
(0, 0, 0), (0, 0), (0, 0, 0)
(0, 0, 0), (0, 1), (0, 0, 1)
...
(1, 1, 0), (1, 1), (1, 1, 1)
(1, 1, 0), (1, 2), (1, 1, 2)
expects a different iterator tuple for each array, one that ranges over (2,2,1)
, another than ranges over (2,3)
, etc.
My approach, which I believe is that used by numpy
c code (at least those parts based on nditer
), generates one iterator over (2,2,3)
, and massages the arrays, via as_strided
to accept that larger range. It is easier to implement a general broadcasting mechanism this way. It separates the broadcasting complexity from the calculation core.
Here's a good intro to nditer
:
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
Upvotes: 1
Reputation: 879799
import numpy as np
x = 10*np.arange(4).reshape((2, 2, 1))
y = 100*np.arange(6).reshape((2, 3))
z = np.nditer([x, y], flags=['multi_index', 'c_index'], order='C')
for a,b in z:
print(np.unravel_index(z.index % x.size, x.shape)
, np.unravel_index(z.index % y.size, y.shape)
, z.multi_index)
yields
((0, 0, 0), (0, 0), (0, 0, 0))
((0, 1, 0), (0, 1), (0, 0, 1))
((1, 0, 0), (0, 2), (0, 0, 2))
((1, 1, 0), (1, 0), (0, 1, 0))
((0, 0, 0), (1, 1), (0, 1, 1))
((0, 1, 0), (1, 2), (0, 1, 2))
((1, 0, 0), (0, 0), (1, 0, 0))
((1, 1, 0), (0, 1), (1, 0, 1))
((0, 0, 0), (0, 2), (1, 0, 2))
((0, 1, 0), (1, 0), (1, 1, 0))
((1, 0, 0), (1, 1), (1, 1, 1))
((1, 1, 0), (1, 2), (1, 1, 2))
Upvotes: 3
Reputation: 13485
What a great question Peter. Here is your answer:
import numpy as np
def get_broadcast_shape(*shapes):
'''
Given a set of array shapes, return the shape of the output when arrays of those
shapes are broadcast together
'''
max_nim = max(len(s) for s in shapes)
equal_len_shapes = np.array([(1, )*(max_nim-len(s))+s for s in shapes])
max_dim_shapes = np.max(equal_len_shapes, axis = 0)
assert np.all(np.bitwise_or(equal_len_shapes==1, equal_len_shapes == max_dim_shapes[None, :])), \
'Shapes %s are not broadcastable together' % (shapes, )
return tuple(max_dim_shapes)
def get_broadcast_indeces(*shapes):
'''
Given a set of shapes of arrays that you could broadcast together, return:
output_shape: The shape of the resulting output array
broadcast_shape_iterator: An iterator that returns a len(shapes)+1 tuple
of the indeces of each input array and their corresponding index in the
output array
'''
output_shape = get_broadcast_shape(*shapes)
base_iter = np.ndindex(output_shape)
def broadcast_shape_iterator():
for out_ix in base_iter:
in_ixs = tuple(tuple(0 if s[i] == 1 else ix for i, ix in enumerate(out_ix[-len(s):])) for s in shapes)
yield in_ixs + (out_ix, )
return output_shape, broadcast_shape_iterator()
output_shape, ix_iter = get_broadcast_indeces((2, 2, 1), (2, 3))
assert output_shape == (2, 2, 3)
for in1_ix, in_2_ix, out_ix in ix_iter:
print (in1_ix, in_2_ix, out_ix)
returns
((0, 0, 0), (0, 0), (0, 0, 0))
((0, 0, 0), (0, 1), (0, 0, 1))
((0, 0, 0), (0, 2), (0, 0, 2))
((0, 1, 0), (1, 0), (0, 1, 0))
((0, 1, 0), (1, 1), (0, 1, 1))
((0, 1, 0), (1, 2), (0, 1, 2))
((1, 0, 0), (0, 0), (1, 0, 0))
((1, 0, 0), (0, 1), (1, 0, 1))
((1, 0, 0), (0, 2), (1, 0, 2))
((1, 1, 0), (1, 0), (1, 1, 0))
((1, 1, 0), (1, 1), (1, 1, 1))
((1, 1, 0), (1, 2), (1, 1, 2))
If anyone knows of any numpy builtins that solve this, that would be better though.
Upvotes: 1