Afshinzkh
Afshinzkh

Reputation: 109

Unexpected results when calling cublas in C++/CLI and C#

I have written a wrapper in C++11/CLI with Visual Studio to use CUDA's CuBLAS. I am using CUDA Toolkit 7.0.

Here is the source code of my wrapper:

#pragma once

#include "stdafx.h"
#include "BLAS.h"
#include "cuBLAS.h"

namespace lab
{
    namespace Mathematics
    {
        namespace CUDA
        {

            void BLAS::DAXPY(int n, double alpha, const array<double> ^x, int incx, array<double> ^y, int incy)
            {
                pin_ptr<double> xPtr = &(x[0]);
                pin_ptr<double> yPtr = &(y[0]);
                pin_ptr<double> alphaPtr = &alpha;

                cuBLAS::DAXPY(n, alphaPtr, xPtr, incx, yPtr, incy);
            }
       }
   }
}

To test this code, I wrote the following test in C#:

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Linq;
using lab.Mathematics.CUDA;

namespace lab.Mathematics.CUDA.Test
{
  [TestClass]
  public class TestBLAS
  {
    [TestMethod]
    public void TestDAXPY()
    {
        var count = 10;
        var alpha = 1.0;
        var a = Enumerable.Range(0, count).Select(x => Convert.ToDouble(x)).ToArray();
        var b = Enumerable.Range(0, count).Select(x => Convert.ToDouble(x)).ToArray();

        // Call CUDA
        BLAS.DAXPY(count, alpha, a, 1, b, 1);

        // Validate results
        for (int i = 0; i < count; i++)
        {
            Assert.AreEqual(i + i, b[i]);
        }
    }
  }
}

The program compiles with x64 architecture with no error. But the results I get are different every time I run the test. More precisely, the array b is the result and it has different values every time. And I don't know why.

I am Also adding my cuda code maybe there, someone can find a problem. note that I don't get any error, warning whatsoever while compiling. I am also wondering maybe I have to do some changes in the compilation while I did nothing and used the default options.

void cuBLAS::DAXPY(int n, const double *alpha, const double *x, int incx, double *y, int incy)
        {

            cudaError_t cudaStat;
            cublasStatus_t stat;

            // Allocate GPU memory
            double *devX, *devY;
            cudastat = cudaMalloc((void **)&devX, (size_t)n*sizeof(*devX));

            if (cudaStat != cudaSuccess) {
                // throw exception

                std::ostringstream msg;
                msg << "device memory allocation failed: fail.Stat = " << cudaStat;
                throw new std::exception(msg.str().c_str());
            }
            cudaMalloc((void **)&devY, (size_t)n*sizeof(*devY));

            // Create cuBLAS handle
            cublasHandle_t handle;
            cublasCreate(&handle);

            // Initialize the input matrix and vector
            cublasSetVector(n, sizeof(*devX), x, incx, devX, incx);
            cublasSetVector(n, sizeof(*devY), y, incy, devY, incy);

            // Call cuBLAS function
            cublasDaxpy(handle, n, alpha, devX, incx, devY, incy);

            // Retrieve resulting vector
            cublasGetVector(n, sizeof(*devY), devY, incy, y, incy);

            // Free GPU resources
            cudaFree(devX);
            cudaFree(devY);
            cublasDestroy(handle);
        }

EDIT: I Added the new suggestion by David Yaw and also added error check for all cuda operations. but I didn't write all the error checks here due to readability. still not working.

Upvotes: 1

Views: 355

Answers (2)

Afshinzkh
Afshinzkh

Reputation: 109

So The code written Up is totally perfect. The only problem I had is I didn't compile it properly. according to This Tutorial, every time you make a change in your cuda program (precisley the .cu file), you have to REBUILD the whole project so Prallel Nsight will compile it. otherwise it will stick to the last compilation.

it is a very tiny point but might save a lot of people, a whole day of debugging and getting nowhere.

Upvotes: 0

David Yaw
David Yaw

Reputation: 27864

Your error is in these lines.

// Initialize the input matrix and vector
cublasSetVector(n, sizeof(*devX), x, incx, devX, incx);

// Call cuBLAS function
cublasDaxpy(handle, n, alpha, devX, incx, devY, incy);

// Retrieve resulting vector
cublasGetVector(n, sizeof(*devY), devY, incy, y, incy);

Quoting the documentation (emphasis mine):

This function multiplies the vector x by the scalar α and adds it to the vector y overwriting the latest vector with the result.

Y is both an input and an output, but you're never setting the value, so you get whatever junk is in the uninitialized memory. Add a call to cublasSetVector to set the initial value of devY before you call cublasDaxpy.

Upvotes: 2

Related Questions