Reputation: 10162
Let
import pyopencl as cl
import pyopencl.array as cl_array
import numpy
a = numpy.random.rand(50000).astype(numpy.float32)
mf = cl.mem_flags
What is the difference between
a_gpu = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
and
a_gpu = cl_array.to_device(self.ctx, self.queue, a)
?
And what is the difference between
result = numpy.empty_like(a)
cl.enqueue_copy(self.queue, result, result_gpu)
and
result = result_gpu.get()
?
Upvotes: 9
Views: 3647
Reputation: 1116
Buffers are CL's version of malloc
, while pyopencl.array.Array
is a workalike of numpy arrays on the compute device.
So for the second version of the first part of your question, you may write a_gpu + 2
to get a new arrays that has 2 added to each number in your array, whereas in the case of the Buffer
, PyOpenCL only sees a bag of bytes and cannot perform any such operation.
The second part of your question is the same in reverse: If you've got a PyOpenCL array, .get()
copies the data back and converts it into a (host-based) numpy array. Since numpy arrays are one of the more convenient ways to get contiguous memory in Python, the second variant with enqueue_copy
also ends up in a numpy array--but note that you could've copied this data into an array of any size (as long as it's big enough) and any type--the copy is performed as a bag of bytes, whereas .get()
makes sure you get the same size and type on the host.
Bonus fact: There is of course a Buffer underlying each PyOpenCL array. You can get it from the .data
attribute.
Upvotes: 20
Reputation: 20287
To answer the first question, Buffer(hostbuf=...)
can be called with anything that implements the buffer interface (reference). pyopencl.array.to_device(...)
must be called with an ndarray
(reference). ndarray
implements the buffer interface and works in either place. However, only hostbuf=...
would be expected to work with for example a bytearray
(which also implements the buffer interface). I have not confirmed this, but it appears to be what the docs suggest.
On the second question, I am not sure what type result_gpu
is supposed to be when you call get()
on it (did you mean Buffer.get_host_array()
?) In any case, enqueue_copy()
works between combination of Buffer
, Image
and host
, can have offsets and regions, and can be asynchronous (with is_blocking=False
), and I think these capabilities are only available that way (whereas get()
would be blocking and return the whole buffer). (reference)
Upvotes: 3