Reputation: 5905
I want to compute an array of rotation matrices from given arrays of aim
and up
vectors.
For simplicity I'll assume the aim
axes will correspond to the matrices's x
components, and up
axes to the matrices's y
components.
The only way that i know of is by doing a series of cross products:
import cProfile
import numpy as np
from numpy.core.umath_tests import inner1d
normalize = lambda V: V/(inner1d(V,V)**0.5)[:,np.newaxis] # inner1d is faster than np.linalg.norm on large arrays
def vectorToMatrix(X,Y):
X = normalize(X) # make sure X is normalized
Z = normalize(np.cross(X,Y)) # Z is the normal to the XY plane
# Re-adjust Y to keep matrices orthogonal
Y = np.cross(Z,X)
return np.dstack((X,Y,Z)).swapaxes(2,1)
Running this on a million random items
np.random.seed(30)
n = 10**6
X = np.random.random((n,3))
Y = np.random.random((n,3))
M = vectorToMatrix(X,Y)
cProfile.run('vectorToMatrix(X,Y)') # 61 function calls in 0.255 seconds
I'm looking for other methods, preferably leveraging numpy/scipy
, that could help calculate the same result given by vectorToMatrix
with a performance boost.
Upvotes: 2
Views: 197
Reputation: 14399
So here's my attempt with @jit
from numba import jit, float64 as float
@jit(float[:,:](float[:,:], float[:,:]), nopython=True)
def crossj(a, b):
c = np.empty(a.shape)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
c[i, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
return c
This is quite a bit faster than @Divakar
np.allclose(np.cross(X,Y), crossj(X,Y) )
True
%timeit np.cross(X,Y)
%timeit numpy_cross_slicing(X,Y)
%timeit crossj(X,Y)
10 loops, best of 3: 75.1 ms per loop
10 loops, best of 3: 88.1 ms per loop
10 loops, best of 3: 23 ms per loop
I'll swipe the ne
-based normalize
from @Divakar, and implement a seocnd jit
for normalize(np.cross())
@jit(float[:,:](float[:,:], float[:,:]), nopython=True)
def norm_crossj(a, b):
c = np.empty(a.shape)
for i in range(a.shape[0]):
n = 0
for j in range(a.shape[1]):
c[i, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
n += c[i,j]**2
n = sqrt(n)
for j in range(a.shape[1]):
c[i,j] /= n
return c
Once again, faster
%timeit normalize(np.cross(X,Y))
%timeit numpy_cross_norm_slicing(X,Y)
%timeit norm_crossj(X,Y)
10 loops, best of 3: 119 ms per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 36.7 ms per loop
Finally:
def vectorToMatrixj(X, Y):
normalize_einsum_numexpr(X) # make sure X is normalized
#normalize_einsum_numexpr(Y) # you don't need this
Z = norm_crossj(X, Y) # Z is the normal to the XY plane
# Re-adjust Y to keep matrices orthogonal
Y = crossj3(Z, X)
return np.dstack((X,Y,Z)).swapaxes(2,1)
Not sure why @Divakar's timings seem to be so different, or why my speedups don't help more, but:
%timeit vectorToMatrix(X,Y)
%timeit vectorToMatrix1(X,Y) #Divakar
%timeit vectorToMatrix2(X,Y) #Divakar
%timeit vectorToMatrixj(X,Y)
1 loop, best of 3: 265 ms per loop
1 loop, best of 3: 319 ms per loop
1 loop, best of 3: 258 ms per loop
1 loop, best of 3: 212 ms per loop
EDIT: fully jit
ted function:
@jit(float[:,:,:](float[:,:], float[:,:]), nopython=True)
def vec2matj(a, b):
c = np.empty(a.shape + a.shape[-1:])
for i in range(a.shape[0]):
na = 0
nc = 0
for j in range(a.shape[1]):
c[i, 2, j] = a[i, j-2] * b[i, j-1] - a[i, j-1] * b[i, j-2]
na += a[i, j]**2
nc += c[i, 2, j]**2
na = sqrt(na)
nc = sqrt(nc)
for j in range(a.shape[1]):
c[i, 2, j] /= nc
c[i, 0, j] = a[i, j] / na
for j in range(a.shape[1]):
c[i, 1, j] = c[i, 2, j-2] * c[i, 0, j-1] - c[i, 2, j-1] * c[i, 0, j-2]
return c
np.allclose(vectorToMatrix(X,Y), vec2matj(X,Y))
True
%timeit vec2matj(X,Y)
%timeit vectorToMatrix(X,Y)
10 loops, best of 3: 60.8 ms per loop
1 loop, best of 3: 240 ms per loop # <- different computer than timings above
Upvotes: 2
Reputation: 221684
Three stages of marginal improvements are possible with tricks from np.einsum
and numexpr
module.
Stage #1 : Compute normalize outputs using sum-reduction
with einsum
and then leveraging numexpr
for performing squared-roots -
import numexpr as ne
def normalize_einsum_numexpr(X):
sq_sums = np.einsum('ij,ij->i',X,X)[:,None]
return ne.evaluate('X/sqrt(sq_sums)')
This would be equivalent of normalize(X)
.
Stage #2 : Get numpy.cross equivalent with slicing using the definition of cross-product
-
def numpy_cross_slicing(X,Y):
c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
return np.column_stack((c0,c1,c2))
Stage #3 : Get normalized cross product with slicing and also leveraging numexpr
-
def numpy_cross_norm_slicing(X,Y):
c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
s = ne.evaluate('sqrt(c0**2 + c1**2 + c2**2)')
c0 /= s
c1 /= s
c2 /= s
return np.column_stack((c0,c1,c2))
This would replace normalize(np.cross(X,Y))
.
Putting it all together, we would have the replacement for vectorToMatrix
, like so -
def vectorToMatrix1(X,Y):
X = normalize_einsum_numexpr(X) # make sure X is normalized
Y = normalize_einsum_numexpr(Y) # make sure Y is normalized
Z = numpy_cross_norm_slicing(X,Y) # Z is the normal ...
Y = numpy_cross_slicing(Z,X)
return np.dstack((X,Y,Z)).swapaxes(2,1)
Runtime test
Input setup :
In [271]: X = np.random.random((10**6,3))
...: Y = np.random.random((10**6,3))
...:
Stage #1 :
In [272]: np.allclose(normalize(X), normalize_einsum_numexpr(X))
Out[272]: True
In [273]: %timeit normalize(X)
...: %timeit normalize_einsum_numexpr(X)
...:
100 loops, best of 3: 11.4 ms per loop
100 loops, best of 3: 10.6 ms per loop
Stage #2 :
In [274]: np.allclose(np.cross(X,Y), numpy_cross_slicing(X,Y) )
Out[274]: True
In [275]: %timeit np.cross(X,Y)
...: %timeit numpy_cross_slicing(X,Y)
...:
10 loops, best of 3: 29.8 ms per loop
10 loops, best of 3: 27.9 ms per loop
Stage #3 :
In [276]: np.allclose(normalize(np.cross(X,Y)), numpy_cross_norm_slicing(X,Y))
Out[276]: True
In [277]: %timeit normalize(np.cross(X,Y))
...: %timeit numpy_cross_norm_slicing(X,Y)
...:
10 loops, best of 3: 44.5 ms per loop
10 loops, best of 3: 34.9 ms per loop
Entire code :
In [395]: np.allclose(vectorToMatrix(X,Y), vectorToMatrix1(X,Y))
Out[395]: True
In [396]: %timeit vectorToMatrix(X,Y)
10 loops, best of 3: 130 ms per loop
In [397]: %timeit vectorToMatrix1(X,Y)
10 loops, best of 3: 122 ms per loop
Hence, some marginal improvement only.
Not giving up!
Looking into the bottlenecks, the many stacking steps weren't helping. So, improving on those, a modified version ended up like this -
def vectorToMatrix2(X,Y):
X = normalize_einsum_numexpr(X) # make sure X is normalized
Y = normalize_einsum_numexpr(Y) # make sure Y is normalized
c0 = X[:,1]*Y[:,2] - X[:,2]*Y[:,1]
c1 = X[:,2]*Y[:,0] - X[:,0]*Y[:,2]
c2 = X[:,0]*Y[:,1] - X[:,1]*Y[:,0]
s = ne.evaluate('sqrt(c0**2 + c1**2 + c2**2)')
c0 /= s
c1 /= s
c2 /= s
d0 = c1*X[:,2] - c2*X[:,1]
d1 = c2*X[:,0] - c0*X[:,2]
d2 = c0*X[:,1] - c1*X[:,0]
c = [c0,c1,c2]
d = [d0,d1,d2]
return np.concatenate((X.T, d, c)).reshape(3,3,-1).transpose(2,0,1)
New timings with same million points setup -
In [505]: %timeit vectorToMatrix(X,Y) # original code
...: %timeit vectorToMatrix1(X,Y)
...: %timeit vectorToMatrix2(X,Y)
...:
10 loops, best of 3: 130 ms per loop
10 loops, best of 3: 117 ms per loop
10 loops, best of 3: 101 ms per loop
20%+
speedup, not too bad!
Upvotes: 1