Reputation: 79268
When you write shaders and such in WebGL or CUDA, how is that code actually translated into GPU instructions?
I want to learn how you can write super low-level code that optimizes graphic rendering to the extreme, in order to see exactly how GPU instructions are executed, at the hardware/software boundary.
I understand that, for CUDA for example, you buy their graphics card (GPU), which is somehow implemented to optimize graphics operations. But then how do you program on top of that (in a general sense), without C?
The reason for this question is because on a previous question, I got the sense that you can't program the GPU directly by using assembly, so I am a bit confused.
If you look at docs like CUDA by example, that's all just C code (though they do have things like cudaMalloc
and cudaFree
, which I don't know what that's doing behind the scenes). But under the hood, that C must be being compiled to assembly or at least machine code or something, right? And if so, how is that accessing the GPU?
Basically I am not seeing how, at a level below C or GLSL, how the GPU itself is being instructed to perform operations. Can you please explain? Is there some snippet of assembly that demonstrates how it works, or anything like that? Or is there another set of some sort of "GPU registers" in addition to the 16 "CPU registers" on x86 for example?
Upvotes: 2
Views: 2093
Reputation: 64904
The GPU driver compiles it to something the GPU understands, which is something else entirely than x86 machine code. For example, here's a snippet of AMD R600 assembly code:
00 ALU: ADDR(32) CNT(4) KCACHE0(CB0:0-15)
0 x: MUL R0.x, KC0[0].x, KC0[1].x
y: MUL R0.y, KC0[0].y, KC0[1].y
1 z: MUL R0.z, KC0[0].z, KC0[1].z
w: MUL R0.w, KC0[0].w, KC0[1].w
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM
The machine code version of that would be executed by the GPU. The driver orchestrates the transfer of the code to the GPU and instructs it to run it. That is all very device specific, and in the case of nvidia, undocumented (at least, not officially documented).
The R0
in that snippet is a register, but on GPUs registers usually work a bit differently. They exist "per thread", and are in a way a shared resource (in the sense that using many registers in a thread means that fewer threads will be active at the same time). In order to have many threads active at once (which is how GPUs tolerate memory latency, whereas CPUs use out of order execution and big caches), GPUs usually have tens of thousands of registers.
Upvotes: 3
Reputation: 45332
Those languages are translated to machine code via a compiler. That compiler just is part of the drivers/runtimes of the various APIs, and is totally implementation specific. There are no families of common instruction sets we are used to in CPU land - like x86, arm or whatever. Different GPUs all have their own incompatible insruction set. Furthermore, there are no APIs with which to upload and run arbitrary binaries on those GPUs. And there is little publically available documentation for that, depending on the vendor.
The reason for this question is because on a previous question, I got the sense that you can't program the GPU directly by using assembly, so I am a bit confused.
Well, you can. In theory, at least. If you do not care about the fact that your code will only work on a small family of ASICs, and if you have all the necessary documentation for that, and if you are willing to implement some interface to the GPU allowing to run those binaries, you can do it. If you want to go that route, you could look at the Mesa3D project, as it provides open source drivers for a number of GPUs, including an llvm-based compiler infrastructure to generate the binaries for the particular architecture.
In practice, there is no useful way of bare metal GPU programming on a large scale.
Upvotes: 3