JCuda. Reusing already used pointer

I have a trouble working with JCUDA. I have a task to make 1D FFT using CUFFT library, but the result should be multiply on 2. So I decided to make 1D FFT with type CUFFT_R2C. Class responsible for this going next:

public class FFTTransformer {

    private Pointer inputDataPointer;

    private Pointer outputDataPointer;

    private int fftType;

    private float[] inputData;

    private float[] outputData;

    private int batchSize = 1;

    public FFTTransformer (int type, float[] inputData) {
        this.fftType = type;
        this.inputData = inputData;
        inputDataPointer = new CUdeviceptr();

        JCuda.cudaMalloc(inputDataPointer, inputData.length * Sizeof.FLOAT);
                inputData.length * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyHostToDevice);

        outputDataPointer = new CUdeviceptr();
        JCuda.cudaMalloc(outputDataPointer, (inputData.length + 2) * Sizeof.FLOAT);


    public Pointer getInputDataPointer() {
        return inputDataPointer;

    public Pointer getOutputDataPointer() {
        return outputDataPointer;

    public int getFftType() {
        return fftType;

    public void setFftType(int fftType) {
        this.fftType = fftType;

    public float[] getInputData() {
        return inputData;

    public int getBatchSize() {
        return batchSize;

    public void setBatchSize(int batchSize) {
        this.batchSize = batchSize;

    public float[] getOutputData() {
        return outputData;

    private void R2CTransform() {

        cufftHandle plan = new cufftHandle();

        JCufft.cufftPlan1d(plan, inputData.length, cufftType.CUFFT_R2C, batchSize);

        JCufft.cufftExecR2C(plan, inputDataPointer, outputDataPointer);


    private void C2CTransform(){

        cufftHandle plan = new cufftHandle();

        JCufft.cufftPlan1d(plan, inputData.length, cufftType.CUFFT_C2C, batchSize);

        JCufft.cufftExecC2C(plan, inputDataPointer, outputDataPointer, fftType);


    public void transform(){
        if (fftType == JCufft.CUFFT_FORWARD) {
        } else {

    public float[] getFFTResult() {
        outputData = new float[inputData.length + 2];
        JCuda.cudaMemcpy(, outputDataPointer,
                outputData.length * Sizeof.FLOAT, cudaMemcpyKind.cudaMemcpyDeviceToHost);
        return outputData;

    public void releaseGPUResources(){

    public static void main(String... args) {
        float[] inputData = new float[65536];
        for(int i = 0; i < inputData.length; i++) {
            inputData[i] = (float) Math.sin(i);
        FFTTransformer transformer = new FFTTransformer(JCufft.CUFFT_FORWARD, inputData);
        float[] result = transformer.getFFTResult();

        HilbertSpectrumTicksKernelInvoker.multiplyOn2(transformer.getOutputDataPointer(), inputData.length+2);


Method which responsible for multiplying uses cuda kernel function. Java method code:

public static void multiplyOn2(Pointer inputDataPointer, int dataSize){

        // Enable exceptions and omit all subsequent error checks

        // Create the PTX file by calling the NVCC
        String ptxFileName = null;
        try {
            ptxFileName = FileService.preparePtxFile("resources\\");
        } catch (IOException e) {
            // TODO Auto-generated catch block

        // Initialize the driver and create a context for the first device.
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        // Load the ptx file.
        CUmodule module = new CUmodule();
        cuModuleLoad(module, ptxFileName);

        // Obtain a function pointer to the "add" function.
        CUfunction function = new CUfunction();
        cuModuleGetFunction(function, module, "calcSpectrumSamples");

        // Set up the kernel parameters: A pointer to an array
        // of pointers which point to the actual values.
        int N = (dataSize + 1) / 2 + 1;
        int pair = (dataSize + 1) % 2 > 0 ? 1 : -1;

        Pointer kernelParameters =,
       int[] { dataSize }),
       int[] { N }), int[] { pair }));

        // Call the kernel function.
        int blockSizeX = 128;
        int gridSizeX = (int) Math.ceil((double) dataSize / blockSizeX);
        cuLaunchKernel(function, gridSizeX, 1, 1, // Grid dimension
                blockSizeX, 1, 1, // Block dimension
                0, null, // Shared memory size and stream
                kernelParameters, null // Kernel- and extra parameters

        // Allocate host output memory and copy the device output
        // to the host.
        float freq[] = new float[dataSize];
        cuMemcpyDtoH(, (CUdeviceptr)inputDataPointer, dataSize
                * Sizeof.FLOAT);

And the kernel function is next:

extern "C"

__global__ void calcSpectrumSamples(float* complexData, int dataSize, int N, int pair) {

    int i = threadIdx.x + blockIdx.x * blockDim.x;

    if(i >= dataSize) return;

    complexData[i] = complexData[i] * 2;

But when I'm trying to pass the pointer which points to the result of FFT (in device memory) to the multiplyOn2 method, it throws the exception on cuCtxSynchronize() call. Exception:

Exception in thread "main" jcuda.CudaException: CUDA_ERROR_UNKNOWN
    at jcuda.driver.JCudaDriver.checkResult(
    at jcuda.driver.JCudaDriver.cuCtxSynchronize(
    at com.ifntung.cufft.HilbertSpectrumTicksKernelInvoker.multiplyOn2(
    at com.ifntung.cufft.FFTTransformer.main(

I was trying to do the same using Visual Studion C++ and there no problems with this. Could you please help me.

P.S. I can solve this prolem, but I need to copy data from device memory to host memory and then copy back with creating new pointers every time before calling new cuda functions, which slows my program executing.

Answers (1)


Where exactly does the error occurs at which line?

The Cuda error can also be a previous error.

Why do you use, you already have that device pointer. Now you pass a pointer to the device pointer to the device?

Pointer kernelParameters =,  

I also recommend to use "this" qualifier or any other marking to detect instance variables. I hate and refuse to look through code, especially as nested and long as your example if I cannot see which scope the variable in methods have trying to debug it by just reading it.
I don't wanna ask myself always where the hell comes this variable from.

If a complex code in a question at SO is not formatted properly I don't read it.

