Reputation: 2053
I want to intercept at PTX level of opencl programs on NVIDIA GPU.
I imagine the routine would probably look like this.
First, I write an opencl program (both host and device code), using NVIDIA compiler to produce respective ptx code. Then I write what I want to do by modifying the PTX code (please don't ask why I didn't do this on the device C code - I have some reasons for it). But problem is, after being modified, how do I compile this PTX code to binary code?
Upvotes: 1
Views: 1108
Reputation: 818
You can use ptxas, which is included in the CUDA toolkit. It compiles .ptx into .cubin, which can then be loaded with the driver API.
Upvotes: 1