petRUShka
petRUShka

Reputation: 10162

How to use arrays in program (global) scope in OpenCL

AMD OpenCL Programming Guide, Section 6.3 Constant Memory Optimization:

Globally scoped constant arrays. These arrays are initialized, globally scoped, and in the constant address space (as specified in section 6.5.3 of the OpenCL specification). If the size of an array is below 64 kB, it is placed in hardware constant buffers; otherwise, it uses global memory. An example of this is a lookup table for math functions.

I want to use this "globally scoped constant array". I have such code in pure C

#define SIZE 101
int *reciprocal_table;

int reciprocal(int number){
  return reciprocal_table[number];
}

void kernel(int *output)
{
  for(int i=0; i < SIZE; i+)
    output[i] = reciprocal(i);
}

I want to port it into OpenCL

__kernel void kernel(__global int *output){
  int gid = get_global_id(0);

  output[gid] = reciprocal(gid);
}

int reciprocal(int number){
  return reciprocal_table[number];
}

What should I do with global variable reciprocal_table? If I try to add __global or __constant to it I get an error:

global variable must be declared in addrSpace constant

I don't want to pass __constant int *reciprocal_table from kernel to reciprocal. Is it possible to initialize global variable somehow? I know that I can write it down into code, but does other way exist?

P.S. I'm using AMD OpenCL

UPD Above code is just an example. I have real much more complex code with a lot of functions. So I want to make array in program scope to use it in all functions.

UPD2 Changed example code and added citation from Programming Guide

Upvotes: 1

Views: 10449

Answers (3)

Matthias Holzapfel
Matthias Holzapfel

Reputation: 540

#define SIZE 2
int constant array[SIZE] = {0, 1};

kernel void
foo (global int* input,
     global int* output)
{
    const uint id = get_global_id (0);
    output[id] = input[id] + array[id];
}

I can get the above to compile with Intel as well as AMD. It also works without the initialization of the array but then you would not know what's in the array and since it's in the constant address space, you could not assign any values.

Program global variables have to be in the __constant address space, as stated by section 6.5.3 in the standard.

UPDATE Now, that I fully understood the question:

One thing that worked for me is to define the array in the constant space and then overwrite it by passing a kernel parameter constant int* array which overwrites the array. That produced correct results only on the GPU Device. The AMD CPU Device and the Intel CPU Device did not overwrite the arrays address. It also is probably not compliant to the standard.

Here's how it looks:

#define SIZE 2
int constant foo[SIZE] = {100, 100};

int
baz (int i)
{
  return foo[i];
}

kernel void
bar (global int* input,
     global int* output,
     constant int* foo)
{
    const uint id = get_global_id (0);
    output[id] = input[id] + baz (id);
}

For input = {2, 3} and foo = {0, 1} this produces {2, 4} on my HD 7850 Device (Ubuntu 12.10, Catalyst 9.0.2). But on the CPU I get {102, 103} with either OCL Implementation (AMD, Intel). So I can not stress, how much I personally would NOT do this, because it's only a matter of time, before this breaks.

Another way to achieve this is would be to compute .h files with the host during runtime with the definition of the array (or predefine them) and pass them to the kernel upon compilation via a compiler option. This, of course, requires recompilation of the clProgram/clKernel for every different LUT.

Upvotes: 4

nat chouf
nat chouf

Reputation: 756

I struggled to get this work in my own program some time ago. I did not find any way to initialize a constant or global scope array from the host via some clEnqueueWriteBuffer or so. The only way is to write it explicitely in your .cl source file.

So here my trick to initialize it from the host is to use the fact that you are actually compiling your source from the host, which also means you can alter your src.cl file before compiling it.

First my src.cl file reads:

__constant double lookup[SIZE] = { LOOKUP };    // precomputed table (in constant memory).

double func(int idx) {
  return(lookup[idx])
}

__kernel void ker1(__global double *in, __global double *out)
{
   ... do something ...
   double t = func(i)
   ...
}

notice the lookup table is initialized with LOOKUP.

Then, in the host program, before compiling your OpenCL code:

  • compute the values of my lookup table in host_values[]
  • on your host, run something like:

    char *buf = (char*) malloc( 10000 );
    int count = sprintf(buf, "#define LOOKUP ");    // actual source generation !
    for (int i=0;i<SIZE;i++) count += sprintf(buf+count, "%g, ",host_values[i]);
    count += sprintf(buf+count,"\n");
    
  • then read the content of your source file src.cl and place it right at buf+count.

  • you now have a source file with an explicitely defined lookup table that you just computed from the host.
  • compile your buffer with something like clCreateProgramWithSource(context, 1, (const char **) &buf, &src_sz, err);
  • voilà !

Upvotes: 1

Dithermaster
Dithermaster

Reputation: 6333

It looks like "array" is a look-up table of sorts. You'll need to clCreateBuffer and clEnqueueWriteBuffer so the GPU has a copy of it to use.

Upvotes: -1

Related Questions