Reputation: 569
I am using pyopencl to speed up my calculations using a GPU and am at the moment mystified by the following problem.
Im doing a simple multiplication of two arrays in a for loop using the following code
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
from pyopencl.elementwise import ElementwiseKernel
ctx = cl.create_some_context(0)
queue = cl.CommandQueue(ctx)
multiply = ElementwiseKernel(ctx,
"float *x, float *y, float *z",
"z[i] = x[i] * y[i]",
"multiplication")
x = cl_array.arange(queue, 1000000, dtype=np.complex64)
y = cl_array.arange(queue, 1000000, dtype=np.complex64)
z = cl_array.empty_like(x)
for n in range(10000):
z = x*y
multiply(x.real, y.real, z.real)
multiply(x, y, z)
The last three lines do of course the same thing namely the multiplication. However, the first two options result in the following error (I commented out the other two of course):
pyopencl.MemoryError: clEnqueueNDRangeKernel failed: mem object allocation failure
I'm just lost why the first two options are running into allocation errors.
NOTES:
GPU: [0] pyopencl.Device 'Capeverde' on 'AMD Accelerated Parallel Processing' at 0x2a76d90
>>> pyopencl.VERSION
(2013, 1)
I am aware that the complex type is not handled correctly, but if you change them into np.float32 I still get the same problem.
Upvotes: 3
Views: 1017
Reputation: 3024
I simplified your program and ran it once in a way that worked on my computer. Here is a version that worked for me:
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
from pyopencl.elementwise import ElementwiseKernel
ctx = cl.create_some_context(0)
queue = cl.CommandQueue(ctx)
multiply = ElementwiseKernel(ctx,
"float *x, float *y, float *z",
"z[i] = x[i] * y[i]",
"multiplication")
x = cl_array.arange(queue, 1000000, dtype=np.float32)
y = cl_array.arange(queue, 1000000, dtype=np.float32)
z = cl_array.empty_like(x)
for i in range(10000):
multiply(x, y, z)
This program runs the kernel with a np.float32
buffer. Your problem may stem from the np.complex64
type, or the fact that you call .real
30000 times - which may create a new buffer each time. Also, it is possible your buffers are too large for your GPU. Try bumping the size of those down.
I am not sure exactly what you are aiming to do, but I strongly recommend avoiding ElementWise until you have spent a little more time working with standard PyOpenCL. ElementWise is just some syntactic sugar that can confuse the true nature of what PyOpenCL is doing.
Trying to solve your problem without ElementWise will help you understand where your data is at all times, how to manage your queue, and when to copy your memory to and from the host.
Upvotes: 1