Reputation: 73
As a starter in Open CL, I have a simple understanding question to optimize GPU computing.
As far as I understood I can make i.e. a matrix of 1000 X 1000 and put one code at each pixel at the same time using a GPU. What about the following option :
I have 100 times a 100 x 100 matrixes and need to calculate them differently. So I need to
do the serial or can I start 100 instances, i.e. I start 100 Python multiprocesses and each
shoot a matrix calculation to the GPU (assumning thetre are enough resources).
Other way round, I have one matrix of 1000 X 1000 and 100 different instance to calculate, can I do this as the same time or serial processing ?
Any advice or concept how to solve this the fastest way is appreciated
Thanks Adrian
Upvotes: 0
Views: 81
Reputation: 9925
The OpenCL execution model revolves around kernels, which are just functions that execute for each point in your problem domain. When you launch a kernel for execution on your OpenCL device, you define a 1, 2 or 3-dimensional index space for this domain (aka the NDRange or global work size). It's entirely up to you how you map the NDRange onto your actual problem.
For example, you could launch an NDRange that is 100x100x100, in order to process 100 sets of 100x100 matrices (assuming they are all independent). Your kernel then defines the computation for a single element of one of these matrices. Alternatively, you could launch 100 kernels, each with a 100x100 NDRange to achieve the same thing. The former is probably faster, since it avoids the overhead of launching multiple kernels.
I strongly recommend taking a look at the OpenCL specification for more information about the OpenCL execution model. Specifically, section 3.2 has a great description of the core concepts surrounding kernel execution.
Upvotes: 1