Reputation: 33
I am solving a photometric stereo problem, in which I have "n" number of light sources with 3 channels(Red, Green, Blue) each.
Thus light array is of shape nx3: lights.shape = nx3
I have the images corresponding to each lighting condition. image dimensions are hxw (height x width), images.shape = nxhxw
I want to matrix multiple each pixel in the image to a matrix of shape 3 x n and get another array of shape 3xhxw these will be the normal vector of each pixel on the image.
shapes of:
S = lights
S_pinv = np.linalg.inv(S.T@S)@S.T # pinv is pseudo inverse, S_pinv.shape : (n_ims,3)
b = S_pinv @ images # I want (3xn @ nxhxw = 3xhxw)
But I am getting this error:
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 100 is different from 3)
Upvotes: 2
Views: 3341
Reputation: 53029
One easy way would be np.inner
; inner
reduces along the last axis and preserves all others; therefore it is up to a transpose a perfect match:
n,h,w = 10,384,512
images = np.random.randint(1,10,(n,h,w))
S_pinv = np.random.randint(1,10,(n,3))
res_inr = np.inner(images.T,S_pinv.T).T
res_inr.shape
# (3, 384, 512)
Similarly, using transposes matmul
actually does the right thing:
res_mml = (images.T@S_pinv).T
assert (res_mml==res_inr).all()
These two seem to be roughly equally fast similar to @AndrasDeak's einsum
method.
In particular, they are not as fast as reshaped matmul (Unsurprising, since a single straight matmul must be one of the most optimized operations there is). They are trading in speed for convenience.
Upvotes: 0
Reputation: 14399
This is basically what np.einsum
is for.
Instead of :
b = S_pinv @ images
Use
b = np.einsum('ij, ikl -> jkl', S_pinv, images)
in this case i = n_ims
, j = 3
, k = h
and l = w
Since this is a single contraction, you can also do it with np.tensordot()
b = np.tensordot(S_pinv.T, images, axes = 1)
or,
b = np.tensordot(S_pinv, images, axes = ([0], [0]))
Upvotes: 0
Reputation: 35080
The problem is that numpy views multidimensional arrays as stacks of matrices, and always the last two dimensions are assumed to be the linear space dimensions. This means that the dot product will not work by collapsing the first dimension of your 3d array.
Instead the simplest thing you can do is to reshape your 3d array into a 2d one, doing the matrix multiplication, and reshaping back into a 3d array. This will also make use of optimised BLAS code which is one of the great advantages of numpy.
import numpy as np
S_pinv = np.random.rand(3, 4)
images = np.random.rand(4, 5, 6)
# error:
# (S_pinv @ images).shape
res_shape = S_pinv.shape[:1] + images.shape[1:] # (3, 5, 6)
res = (S_pinv @ images.reshape(images.shape[0], -1)).reshape(res_shape)
print(res.shape) # (3, 5, 6)
So instead of (3,n) x (n,h,w)
we do (3,n) x (n, h*w) -> (3, h*w)
which we reshape back to (3, h, w)
. Reshaping is free, because this doesn't mean any actual manipulation of data in memory (only a reinterpretation of the single block of memory that underlies the array), and as I said proper matrix products are highly optimized in numpy.
Since you asked for a more intuitive way, here's an alternative making use of numpy.einsum
. It will probably be slower, but it's very transparent if you get a little bit used to its notation:
res_einsum = np.einsum('tn,nhw -> thw', S_pinv, images)
print(np.array_equal(res, res_einsum)) # True
This notation names each of the dimensions of the input arrays: for S_pinv
the first and second dimensions are named t
and n
, respectively, and similarly n
, h
and w
for images
. The output is set to have dimensions thw
which means that any remaining dimensions that are not present in the output shape will be summed along after multiplying the input arrays. This is exactly what you need.
As you noted in a comment, you could also transpose your arrays so that np.dot
finds the right dimensions in the right place. But this will also be slow because this might lead to copies in memory, or at least suboptimal looping over your arrays.
I made a quick timing comparison using the following defininitions:
def reshaped(S_pinv, images):
res_shape = S_pinv.shape[:1] + images.shape[1:]
return (S_pinv @ images.reshape(images.shape[0], -1)).reshape(res_shape)
def einsummed(S_pinv, images):
return np.einsum('tn,nhw -> thw', S_pinv, images)
def transposed(S_pinv, images):
return (S_pinv @ images.transpose(2, 0, 1)).transpose(1, 2, 0)
And here's the timing test using IPython's built-in %timeit
magic, and some more realistic array sizes:
>>> S_pinv = np.random.rand(3, 100)
... images = np.random.rand(100, 200, 300)
... args = S_pinv, images
... %timeit reshaped(*args)
... %timeit einsummed(*args)
... %timeit transposed(*args)
5.92 ms ± 460 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.9 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
44.5 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 3
Reputation: 564
answer is np.swapaxes
import numpy as np
q= np.random.random([2, 5,5])
q.shape
w = np.random.random([3,2])
w.shape
w@q
and we have ValueError
but
import numpy as np
q= np.random.random([5, 2,5])
q.shape
w = np.random.random([3,2])
w.shape
res = (w@q).swapaxes(0,1)
res.shape # =[3, 5, 5]
Upvotes: 0