PPGoodMan
PPGoodMan

Reputation: 351

Using global variables in opencl

I'm a newbie to OpenCL and I want to write my program in the most efficient way.

In my program I read an array of floats and produce an array of floats as the result. And my question is:

Is there any problem causing any inefficiencies by writing my calculated answer to the same buffer as I get my input? Such as:

c[i] = c[i]*2;

where c is a float array in the global memory. Is there any performance improvement I can get by changing the above into:

 d[i] = c[i]*2;

where both c and d are float arrays in the memory.

Upvotes: 0

Views: 2430

Answers (3)

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11920

Depends on the usage:

  __global
   ^    __global
   |      ^          __constant
   |      |          ^      ^           __local
   |      |          |      |           ^                 __private
   |      |          |      |           |                  ^
  d[i] = c[i]   +   b[0] + b[1]     +  a[0....j]   +   e[0...16]
  (few times)     (few per thread)   (10-1M times per item per thread)  

  write  read       read only         random access     max reusage    

   72 GB/s           102 GB/s           819 GB/s         4915 GB/s

  paralleled        broadcasted    parallel/broadcasted  free to use

    2GB/GPU          64 kB/GPU       64 kB / Block       256kB/ Block   

specifications are AMD Verde PRO's as an example.

If it is a mobile device you are working on, there may be only a __global. Other specifiers may be interpreted as just another __global so could decrease performance.

Upvotes: 2

DarkZeros
DarkZeros

Reputation: 8410

In theory you may get some improvement under some circumstances, because the array properties may help the HW taking better decisions and pipeline better.

In real practice I doubt any current HW will produce better results with one or another. They should be 1:1.

If you are interested in this academically it is worth trying and get some proof. But if you are writing some piece of code, the HW will parallelize IO/computation and the IO time is negligible (unless the amount of operation is small, in that case you should not use CL anyway).

Upvotes: 0

Lee
Lee

Reputation: 930

It is possible, depending on the device and compiler. On some devices the compiler may assume that it can use a read-only cache on the input and generate appropriate instructions to do that. This could give you locality for neighbouring reads. If you use the same array for both read and write the compiler will spot that, assume array c is read-write and disable the cache. On the other hand you have no temporal reuse in your example, so you may benefit little from the cache anyway.

I think realistically you'll have to experiment, though. There is a lot of variation in the OpenCL-supporting hardware out there.

Upvotes: 0

Related Questions