Reputation: 11
I'm new to the GPU compute world, and I'm trying to run some demo code on my AMD Radeon RX 7800 XT GPU using ROCm 5.7 / HIP.
I've looked trough some sources, but I don't seem to find any answers on why the provided code doesn't work. I've been following this tutorial with this repository.
The first problem was that I needed to insert #define __HIP_PLATFORM_AMD__
at the beginning of the file. Why does the tutorial not mention this?
But the major problem is that in the line int id = blockDim.x * blockIdx.x + threadIdx.x;
the compiler gives these errors:
GPUCompute\main.cpp line 16 error: use of undeclared identifier 'blockDim'
GPUCompute\main.cpp line 16 error: use of undeclared identifier 'blockIdx'
GPUCompute\main.cpp line 16 error: use of undeclared identifier 'threadIdx'
GPUCompute\main.cpp line 56 error: use of undeclared identifier 'hipLaunchKernelGGL'
Note that I shortened the path for privacy reasons.
What should I do? Do I need to declare them? If yes, how do I know what value to set them?
My code is:
#define __HIP_PLATFORM_AMD__
#include <hip/hip_runtime.h>
#include <hip/amd_detail/amd_hip_runtime.h>
#include <stdio.h>
#include <iostream>
// Size of array
#define N 1048576
using namespace std;
// Kernel
__global__ void vector_addition(double *a, double *b, double *c)
{
int id = blockDim.x * blockIdx.x + threadIdx.x;
if(id < N)
c[id] = a[id] + b[id];
}
// Main program
int main()
{
// Number of bytes to allocate for N doubles
size_t bytes = N*sizeof(double);
// Allocate memory for arrays A, B, and C on host
double *A = (double*)malloc(bytes);
double *B = (double*)malloc(bytes);
double *C = (double*)malloc(bytes);
// Allocate memory for arrays d_A, d_B, and d_C on device
double *d_A, *d_B, *d_C;
hipMalloc(&d_A, bytes);
hipMalloc(&d_B, bytes);
hipMalloc(&d_C, bytes);
// Fill host arrays A, B, and C
for(int i=0; i<N; i++)
{
A[i] = 1.0;
B[i] = 2.0;
C[i] = 0.0;
}
// Copy data from host arrays A and B to device arrays d_A and d_B
hipMemcpy(d_A, A, bytes, hipMemcpyHostToDevice);
hipMemcpy(d_B, B, bytes, hipMemcpyHostToDevice);
// Set execution configuration parameters
// thr_per_blk: number of HIP threads per grid block
// blk_in_grid: number of blocks in grid
int thr_per_blk = 128;
int blk_in_grid = ceil(float(N) / thr_per_blk);
// Launch kernel
hipLaunchKernelGGL(vector_addition, blk_in_grid, thr_per_blk, 0, 0, d_A, d_B, d_C);
// Copy data from device array d_C to host array C
hipMemcpy(C, d_C, bytes, hipMemcpyDeviceToHost);
// Verify results
double tolerance = 1.0e-14;
for(int i=0; i<N; i++)
{
if(fabs(C[i] - 3.0) > tolerance)
{
printf("Error: value of C[%d] = %f instead of 3.0\n", i, C[i]);
exit(-1);
}
}
// Free CPU memory
free(A);
free(B);
free(C);
// Free GPU memory
hipFree(d_A);
hipFree(d_B);
hipFree(d_C);
printf("\n---------------------------\n");
printf("__SUCCESS__\n");
printf("---------------------------\n");
printf("N = %d\n", N);
printf("Threads Per Block = %d\n", thr_per_blk);
printf("Blocks In Grid = %d\n", blk_in_grid);
printf("---------------------------\n\n");
return 0;
}
Note: I'm coding in Code::Blocks and (I think) I set up the compiler found in the directory of ROCm. I set the search directories for both the linker and the compiler.
My specifications (if one would care) are:
And for those who are interested in the desired behavior: I just simply want it to compile and perform the vector additions on my GPU.
Upvotes: 0
Views: 456