Reputation: 2311
I am trying to get the following, Numba-nopython compatible, function running with target='cuda'
:
@numba.jit(nopython = True)
def hermite_polynomials(X, N):
r'''
Evaluate the orthonormal Hermite polynomials on
:math:`(\mathbb{R},\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)dx)` in :math:`X\subset\mathbb{R}`
:param X: Locations of desired evaluations
:type X: One dimensional np.array
:param N: Number of polynomials
:rtype: numpy.array of shape :code:`X.shape[0] x N`
'''
out = np.zeros((X.shape[0], N))
deg = N - 1
factorial = np.ones((1,N))
for i in range(1,N):
factorial[0,i:]*=i
orthonormalizer = 1 / np.sqrt(factorial)
if deg < 1:
out = np.ones((X.shape[0], 1))
else:
out[:, 0] = np.ones((X.shape[0],))
out[:, 1] = X
for n in range(1, deg):
out[:, n + 1] = X * out[:, n] - n * out[:, n - 1]
return out * orthonormalizer
However, I don't find any example code that is both easy enough for me to understand (only Python and MATLAB experience, no computer scientist) and difficult enough to actually be helpful (I only found a+b
kind of examples).
So far, I arrived at the following function, which needs to be passed an array of ones (I couldn't define an array myself, cuda.local.array((N,1),dtype=float64)
resulted in a ConstantInferenceError
). I accepted that I have to do the multiplications entrywise, hence the additional for loops, but not even that works, as I get a Invalid usage of * with parameters (array(float64, 1d, C), float64)
error.
@numba.jit(target = 'cuda')
def hermite_polynomials2(X, N,out):
r'''
Evaluate the orthonormal Hermite polynomials on
:math:`(\mathbb{R},\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)dx)` in :math:`X\subset\mathbb{R}`
:param X: Locations of desired evaluations
:type X: One dimensional np.array
:param N: Number of polynomials
:rtype: numpy.array of shape :code:`X.shape[0] x N`
'''
deg = N-1
L = X.shape[0]
if deg == 0:
return
else:
out[:, 1] = X
for n in range(1, deg):
for j in range(L):
out[j, n + 1] = X * out[j, n] - n * out[j, n - 1]
factorial = 1
for i in range(1,N):
factorial *= i
for j in range(L):
out[j,i] /= np.sqrt(factorial)
return
How do I do the multiplication?
Upvotes: 0
Views: 584
Reputation: 72349
You probably want something like:
for j in range(L):
out[j, n + 1] = X[j] * out[j, n] - n * out[j, n - 1]
but note that the entire exercise of writing this kernel is mostly futile. Quoting from the relevant documentation:
For best performance, users should write code such that each thread is dealing with a single element at a time.
The kernel you have written will be completely serial. It will be slower than the CPU version. You will need to write the code in a very different manner for it to be of any value on a GPU.
Upvotes: 1