Adrian Monk
Adrian Monk

Reputation: 73

OPEN CL, Python and parallelisation

As a starter in Open CL, I have a simple understanding question to optimize GPU computing.

As far as I understood I can make i.e. a matrix of 1000 X 1000 and put one code at each pixel at the same time using a GPU. What about the following option :

Any advice or concept how to solve this the fastest way is appreciated

Thanks Adrian

Upvotes: 0

Views: 81

Answers (1)

jprice
jprice

Reputation: 9925

The OpenCL execution model revolves around kernels, which are just functions that execute for each point in your problem domain. When you launch a kernel for execution on your OpenCL device, you define a 1, 2 or 3-dimensional index space for this domain (aka the NDRange or global work size). It's entirely up to you how you map the NDRange onto your actual problem.

For example, you could launch an NDRange that is 100x100x100, in order to process 100 sets of 100x100 matrices (assuming they are all independent). Your kernel then defines the computation for a single element of one of these matrices. Alternatively, you could launch 100 kernels, each with a 100x100 NDRange to achieve the same thing. The former is probably faster, since it avoids the overhead of launching multiple kernels.

I strongly recommend taking a look at the OpenCL specification for more information about the OpenCL execution model. Specifically, section 3.2 has a great description of the core concepts surrounding kernel execution.

Upvotes: 1

Related Questions