BluSteve
BluSteve

Reputation: 1

Why are eigenvectors computed only every other function call in Cusolver?

I'm writing a program to compare the speed of JBlas and JCublas. When I call the below function the first time, everything works fine and v contains the correct eigenvectors. When I call it a second time, it takes a lot less time to compute but only returns the inputted symmetric matrix a, as if d_A's value was never changed.

It seems that the function only works as expected on odd-numbered calls. I have a hunch that this error is due to something in the GPU memory not getting cleared properly, but I can't find it.

public static void getSymEigenGPU(cusolverDnHandle handle,
                                              DoubleMatrix a) {
        int n2 = a.length;
        int n = a.rows;
        double[] a1d = to1d(a);
        double[] v = new double[n2];
        double[] w = new double[n];

        Pointer h_A = Pointer.to(a1d);
        Pointer h_V = Pointer.to(v);
        Pointer h_W = Pointer.to(w);
        Pointer d_A = new Pointer();
        Pointer d_V = new Pointer();
        Pointer d_W = new Pointer();

        Pointer d_work = new Pointer();

        JCuda.cudaMalloc(d_A, (long) n2 * Sizeof.DOUBLE);
        JCuda.cudaMalloc(d_V, (long) n2 * Sizeof.DOUBLE);
        JCuda.cudaMalloc(d_W, n * Sizeof.DOUBLE);

        int jobz = CUSOLVER_EIG_MODE_VECTOR;
        int uplo = CUBLAS_FILL_MODE_UPPER;

        JCuda.cudaMemcpy(d_A, h_A, (long) n2 * Sizeof.DOUBLE,
                cudaMemcpyHostToDevice);

        int[] lworkl = new int[1];
        JCusolverDn.cusolverDnDsyevd_bufferSize(handle, jobz, uplo, n, d_A, n,
                d_W, lworkl);
        int lwork = lworkl[0];
        JCuda.cudaMalloc(d_work, (long) lwork * Sizeof.DOUBLE);

        NanoStopWatch sw = NanoStopWatch.sw();
        JCusolverDn.cusolverDnDsyevd(handle, jobz, uplo, n, d_A, n,
                d_W, d_work, n2, new Pointer());
        System.out.println("sw.stop() = " + sw.stop());

        JCuda.cudaMemcpy(h_W, d_W, Sizeof.DOUBLE * n, cudaMemcpyDeviceToHost);
        JCuda.cudaMemcpy(h_V, d_A, (long) Sizeof.DOUBLE * n2,
                cudaMemcpyDeviceToHost);

        pp(from1d(v));

        JCuda.cudaFree(d_A);
        JCuda.cudaFree(d_V);
        JCuda.cudaFree(d_W);
        JCuda.cudaFree(d_work);
    }

Upvotes: 0

Views: 56

Answers (0)

Related Questions