Wrong Output for Array Operations Using GPU

Question

I'm trying to write a function that performs element-wise addition, subtraction, multiplication, or division on 2d arrays using Anaconda Accelerate, but the function I wrote keeps getting the wrong answer. Not sure what's going on.

import numpy as np
from numba import cuda

#Define functions
ADD, SUB, MUL, DIV = 1, 2, 3, 4
@cuda.jit('void(complex64[:,:], complex64[:,:], int8)')
def math_inplace_2d_cuda(a, b, operation):
    m, n = a.shape[0], a.shape[1]
    i, j = cuda.grid(2)
    if i < m and j < n:
        if operation == ADD: a[i, j] += b[i, j]
        if operation == SUB: a[i, j] -= b[i, j]
        if operation == MUL: a[i, j] *= b[i, j]
        if operation == DIV: a[i, j] /= b[i, j]

def math_inplace_2d_host(a, b, operation):
    m, n = a.shape[0], a.shape[1]
    for i in range(m):
        for j in range(n):
            if operation == ADD: a[i, j] += b[i, j]
            if operation == SUB: a[i, j] -= b[i, j]
            if operation == MUL: a[i, j] *= b[i, j]
            if operation == DIV: a[i, j] /= b[i, j]

#Create arrays
a = np.array([[1., 2], [3, 4]])
b = a.copy()*2

a_dev = cuda.to_device(a)
b_dev = cuda.to_device(b)

#Threading
threadperblock = 32, 8
def best_grid_size(size, tpb):
    bpg = np.ceil(np.array(size, dtype=np.float) / tpb).astype(np.int).tolist()
    return tuple(bpg)
blockpergrid = best_grid_size(a_dev.shape, threadperblock)
stream = cuda.stream()

#Do operation
op = ADD
math_inplace_2d_host(a, b, op)
math_inplace_2d_cuda[blockpergrid, threadperblock, stream](a_dev, b_dev, op)
print '
host
', a
print '
device
', a_dev.copy_to_host()

This program, with the provided values for the a and b arrays, results in this output (the host and device arrays should be the same):

host
[[  3.   6.]
 [  9.  12.]]

device
[[  384.   768.]
 [ 1024.  1536.]]

When I try subtracting, I get this:

host
[[-1. -2.]
 [-3. -4.]]

device
[[ -4.65661287e-10  -1.19209290e-07]
 [ -1.19209290e-07  -1.19209290e-07]]

For multiplication:

host
[[  2.   8.]
 [ 18.  32.]]

device
[[  1.59512330e-314   1.59615943e-314]
 [  1.59672607e-314   1.59732508e-314]]

For division:

host
[[ 0.5  0.5]
 [ 0.5  0.5]]

device
[[  5.25836359e-315   5.25433420e-315]
 [  5.25481893e-315   5.25525520e-315]]

JoshAdel · Accepted Answer

Your code works for me if I change the jit signature to:

@cuda.jit('void(float64[:,:], float64[:,:], int64)')

or if I change the definition of a and op to:

a = np.array([[1., 2], [3, 4]]).astype(np.complex64)
...
op = np.int8(ADD)

In the latter case, where the op is ADD, I get:

host
[[  3.+0.j   6.+0.j]
 [  9.+0.j  12.+0.j]]

device
[[  3.+0.j   6.+0.j]
 [  9.+0.j  12.+0.j]]

I would have expected a type error emitted from Numba, but it seems to silently cast and do something wrong. Maybe raise the issue on the Numba google group.

Wrong Output for Array Operations Using GPU

Answers (1)

Related Questions