Reputation: 300
I'm trying to get unified memory to work with classes, and to pass and manipulate arrays in unified memory with kernel calls. I want to pass everything by reference.
So I'm overriding the new method for classes and arrays so they are accessible by the GPU, but I think I need to add more code to have arrays in unified memory, but not quite sure how to do this. I get a memory access error when the fillArray() method is called.
If I have to do these sorts of operations (arithmetic on arrays and copying between different sized arrays) hundreds of times, is unified memory a good approach or should I stick with manually copying between cpu and gpu memory? Thank you very much!
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <stdio.h>
#define TILE_WIDTH 4
#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif
__global__ void add1(int height, int width, int *a, int *resultArray)
{
int w = blockIdx.x * blockDim.x + threadIdx.x; // Col // width
int h = blockIdx.y * blockDim.y + threadIdx.y;
int index = h * width + w;
if ((w < width) && (h < height))
resultArray[index] = a[index] + 1;
}
class Managed
{
public:
void *operator new(size_t len)
{
void *ptr;
cudaMallocManaged(&ptr, len);
return ptr;
}
void Managed::operator delete(void *ptr)
{
cudaFree(ptr);
}
void* operator new[] (size_t len) {
void *ptr;
cudaMallocManaged(&ptr, len);
return ptr;
}
void Managed::operator delete[] (void* ptr) {
cudaFree(ptr);
}
};
class testArray : public Managed
{
public:
testArray()
{
height = 16;
width = 8;
myArray = new int[height*width];
}
~testArray()
{
delete[] myArray;
}
CUDA_CALLABLE_MEMBER void runTest()
{
fillArray(myArray);
printArray(myArray);
dim3 dimGridWidth((width - 1) / TILE_WIDTH + 1, (height - 1)/TILE_WIDTH + 1, 1);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);
add1<<<dimGridWidth,dimBlock>>>(height, width, myArray, myArray);
cudaDeviceSynchronize();
printArray(myArray);
}
private:
int *myArray;
int height;
int width;
void fillArray(int *myArray)
{
for (int i = 0; i < height; i++){
for (int j = 0; j < width; j++)
myArray[i*width+j] = i*width+j;
}
}
void printArray(int *myArray)
{
for (int i = 0; i < height; i++){
for (int j = 0; j < width; j++)
printf("%i ",myArray[i*width+j]);
printf("\n");
}
}
};
int main()
{
testArray *test = new testArray;
test->runTest();
//testArray test;
//test.runTest();
system("pause");
return 0;
}
Upvotes: 0
Views: 1292
Reputation: 300
It is very simple but I didn't know how to do it. Anyway if you change this line of code:
myArray = new int[height*width];
to the following, it looks like it works.
cudaMallocManaged(&myArray, height * width * sizeof(int));
If you have another way of doing this, I'd be interested in seeing it.
Upvotes: 1
Reputation: 11126
Your error is very simple: myArray
is allocated on the host, not as unified memory.
The reason for that is that while testArray
dervies from Managed
(and thus your testArray *test = new testArray
allocates unified memory), the allocation that is being done inside its constructor allocates an array of int
s, which are not derived from Managed
.
Therefore your pointer lives in unified memory, but points to host memory.
Just off the top of my mind, the following should help:
struct UnifiedInt : int, Managed { /* implement a few convenience functions */ };
Upvotes: 1