Gswffye
Gswffye

Reputation: 300

CUDA: unified memory, using arrays

I'm trying to get unified memory to work with classes, and to pass and manipulate arrays in unified memory with kernel calls. I want to pass everything by reference.

So I'm overriding the new method for classes and arrays so they are accessible by the GPU, but I think I need to add more code to have arrays in unified memory, but not quite sure how to do this. I get a memory access error when the fillArray() method is called.

If I have to do these sorts of operations (arithmetic on arrays and copying between different sized arrays) hundreds of times, is unified memory a good approach or should I stick with manually copying between cpu and gpu memory? Thank you very much!

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <iostream>
#include <stdio.h>


#define TILE_WIDTH 4

#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif

__global__ void add1(int height, int width, int *a, int *resultArray)
{
    int w = blockIdx.x * blockDim.x + threadIdx.x; // Col // width
    int h = blockIdx.y * blockDim.y + threadIdx.y;
    int index = h * width + w;

    if ((w < width) && (h < height))
        resultArray[index] = a[index] + 1;
}

class Managed 
{
public:
    void *operator new(size_t len) 
    {
        void *ptr;
        cudaMallocManaged(&ptr, len);
        return ptr;
    }

    void Managed::operator delete(void *ptr) 
    {
        cudaFree(ptr);
    }

    void* operator new[] (size_t len) {
        void *ptr; 
        cudaMallocManaged(&ptr, len);
        return ptr;
    }
        void Managed::operator delete[] (void* ptr) {
        cudaFree(ptr);
    }
};

class testArray : public Managed
{
public: 
    testArray()
    {
        height = 16;
        width = 8;
        myArray = new int[height*width];
    }
    ~testArray()
    {
        delete[] myArray;
    }

    CUDA_CALLABLE_MEMBER void runTest()
    {
        fillArray(myArray);
        printArray(myArray);

        dim3 dimGridWidth((width - 1) / TILE_WIDTH + 1, (height - 1)/TILE_WIDTH + 1, 1);
        dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);

        add1<<<dimGridWidth,dimBlock>>>(height, width, myArray, myArray);
        cudaDeviceSynchronize();
        printArray(myArray);
    }

private:

    int *myArray;
    int height; 
    int width;

    void fillArray(int *myArray)
    {
        for (int i = 0; i < height; i++){
            for (int j = 0; j < width; j++)
                myArray[i*width+j] = i*width+j;
        }
    }

    void printArray(int *myArray)
    {
        for (int i = 0; i < height; i++){
            for (int j = 0; j < width; j++)
                printf("%i ",myArray[i*width+j]);
            printf("\n");
        }
    }
};

int main()
{
    testArray *test = new testArray;
    test->runTest();

    //testArray test;
    //test.runTest();

    system("pause");
    return 0;
}

Upvotes: 0

Views: 1292

Answers (2)

Gswffye
Gswffye

Reputation: 300

It is very simple but I didn't know how to do it. Anyway if you change this line of code:

myArray = new int[height*width];

to the following, it looks like it works.

cudaMallocManaged(&myArray, height * width * sizeof(int));

If you have another way of doing this, I'd be interested in seeing it.

Upvotes: 1

danielschemmel
danielschemmel

Reputation: 11126

Your error is very simple: myArray is allocated on the host, not as unified memory.

The reason for that is that while testArray dervies from Managed (and thus your testArray *test = new testArray allocates unified memory), the allocation that is being done inside its constructor allocates an array of ints, which are not derived from Managed.

Therefore your pointer lives in unified memory, but points to host memory.

Just off the top of my mind, the following should help:

struct UnifiedInt : int, Managed { /* implement a few convenience functions */ };

Upvotes: 1

Related Questions