user1111929
user1111929

Reputation: 6097

GPU programming via JOCL uses only 6 out of 80 shader cores?

I am trying to let a program run on my GPU and to start with an easy sample I modified the first sample on http://www.jocl.org/samples/samples.html and to run the following little script: I run n simultaneous "threads" (what's the correct name for the GPU equivalent of a thread?), each of which performs 20000000/n independent tanh()-computations. You can see my code here: http://pastebin.com/DY2pdJzL

The speed is by far not what I expected:

So after n=6 (be it n=8, n=20, n=100, n=1000 or n=100000), there is no performance increase, which means only 6 of these are computed in parallel. However, according to the specifications of my card there should be 80 cores: http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5450-overview/pages/hd-5450-overview.aspx#2

It is not a matter of overhead, since increasing or decreasing the 20000000 only matters a linear factor in all the execution times.

I have installed the AMD APP SDK and drivers that support OpenCL: see http://dl.dropbox.com/u/3060536/prtscr.png and http://dl.dropbox.com/u/3060536/prtsrc2.png for details (or at least I conclude from these that OpenCL is running correctly).

So I'm a bit clueless now, where to search for answer. Why can JOCL only do 6 parallel executions on my ATI Radeon HD 5450?

Upvotes: 0

Views: 405

Answers (1)

vocaro
vocaro

Reputation: 2779

You are hard-coding the local work size to 1. Use a larger size or let the driver choose one for you.

Also, your kernel is not designed in an OpenCL style. You should take out the for loop and let the driver handle the iterating for you.

Upvotes: 1

Related Questions