Matthew Mitchell
Matthew Mitchell

Reputation: 5393

OpenCL executing without input data or using 3 dimensions

Hello I have an algorithm similar in fashion to this (In Python):

for a in xrange(10):
    for b in xrange(15):
        for c in xrange(5):
            for d in xrange(15):
                #etc

The code goes through many combinations of linear parameters. Can I and should I execute a kernel with no input data and just an id where the numerical parameters can be calculated or should I send 3 dimensions of integer data for the first 3 parameters and then calculate the rest of the parameters within each of the work items.

I am not aware of any way I can run the commands with no input data and simply have an incrementing id for all the work items so I can calculate the parameters for all combinations. Is this possible? Is it recommended?

Thank you for any help.

Note: Using C libraries for OpenCL.

Upvotes: 0

Views: 318

Answers (1)

Lu4
Lu4

Reputation: 15032

It is quite hard to understand what problem you have, if you are talking about kernel arguments, you should have at least one kernel argument, kernels without kernel arguments are useless, since, OpenCL provides data-based parallelism and if you don't have any data you don't have any parallelism you can execute your kernel on one cpu thread...

If you have problems with dimensions, i.e. you need 4 or more dimensions but OpenCL provides 3 than you should do something like:

// Assuming that you have only a,b,c,d
// and 'amount of work' = 10 * 15 * 05 * 15

int index = get_global_id(0);
int d = index % 15; index /= 15;
int c = index % 05; index /= 05;
int b = index % 15; index /= 15;
int a = index % 10; index /= 10;

#etc (do something with a,b,c,d)

The last thing, try making your programs as flat as possible, OpenCL doesn't like many cycles and branching logics, try to unwrap your loops by hand instead of:

// if it is possible to render some constant into the OpenCL code,
// than try to expand it as much as possible

for (int i = 0; i < 4; i++) // The constant is 4
{
   float x = sin(3.14 * i + ...);
   float y = cos(x + ....);
   x[i] = a * i * x + y ....;
}

write it the following way:

float x;
float y;

x = sin(3.14 * 0 + ...);
y = cos(x + ....);
x[0] = a * 0 * x + y ....;


x = sin(3.14 * 1 + ...);
y = cos(x + ....);
x[1] = a * 1 * x + y ....;


x = sin(3.14 * 2 + ...);
y = cos(x + ....);
x[2] = a * 2 * x + y ....;


x = sin(3.14 * 3 + ...);
y = cos(x + ....);
x[3] = a * 3 * x + y ....;

The flatter the better! I'm talking about reasonable expansion, if you have a 1024 cycles in the loop expanding all of them is not reasonable. In this case you should expand it by the order of 2 or 4 or 8 or 16 cycles, this will lead you to having 512 or 256 or 128 or 64 loop cycles, this can get you huge performance boost...

Upvotes: 1

Related Questions