Yoav
Yoav

Reputation: 6098

Estimating OpenCL memory access performance for algorithm design

I have a task which I need to achieve using one of several possible algorithm.

Each algorithm, has its own opportunities for local-memory optimization, and I would like to estimate which algorithm will perform best, based on counting compute operations and memory access.

For the purpose of comparing different number of local memory access operations vs. global memory access operations, I would like to estimate the price (in cycles?) of local memory access (read / write) vs the price of global memory access.

How many cycles does it take (on a modern, consumer GPU) to perform each of these:

Note: I use "local memory" and "global memory" in their meaning in OpenCL.

Upvotes: 0

Views: 404

Answers (1)

Roman Arzumanyan
Roman Arzumanyan

Reputation: 1814

Usually, access to local memory tooks couple of GPU cycles. Access to global memory tooks tens of cycles. From one video card to another numbers differ significantly. So that are very general numbers, which only show difference of magnitude.

As I understand, you're concerned about low-level optimization. If that's right, than you can use software, which is usually shipped with SDK by GPU vendor. Many of them (AMD, ARM, etc) provides offline compilers, which allows export of clProgramm's compiled binaries assembler with instructions-per-cycle information. Then you will get most definite numbers.

Upvotes: 1

Related Questions