user1333885
user1333885

Reputation: 13

CUDA counter letter

I've got a problem with CUDA. I want to make small program which count letters from array of char.

I read letters from file and save to int variable called N, how many letters read. After that I malloc.

char *b_h, *b_d;
size_t size_char = N * sizeof(char);
b_h = (char *)malloc(size_char);

After malloc I read file again and assign current letter to element of char array (it works):

int j=0;
while(fscanf(file,"%c",&l)!=EOF)
{
    b_h[j]=l;
    j++;
}

After that I create an int variable (a_h) as counter.

int *a_h, *a_d;
size_t size_count = 1*sizeof(int);
a_h = (int *)malloc(size_count);

Ok, go with CUDA:

cudaMalloc((void **) &a_d, size_count);
cudaMalloc((void **) &b_d, size_char);

Copy from host to device:

cudaMemcpy(a_d, a_h, size_count, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b_h, size_char, cudaMemcpyHostToDevice);

Set blocks and call CUDA function:

int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
square_array <<< n_blocks, block_size >>> (a_d,b_d,c_d, N);

Receive from function:

cudaMemcpy(a_h, a_d, size_count, cudaMemcpyDeviceToHost);
cudaMemcpy(b_h, d_d, size_char, cudaMemcpyDeviceToHost);

And print count:

printf("\Count: %d\n", a_h[0]);

And it doesn't work. In array of char I have sentence: Super testSuper test ; I'm looking for 'e' letter and I got a_h[0] = 1. Where is problem?

CUDA function:

__global__ void square_array(int *a, char *b, int *c, int N)
{
const char* letter = "e";

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx<N) 
{
    if(b[idx] == *letter)
    {
        a[0]++;
    }
}
}

Please, help me.

Upvotes: 1

Views: 627

Answers (1)

Roger Dahl
Roger Dahl

Reputation: 15734

I'm guessing that N is small enough that your GPU is able to launch all your threads in parallel. So, you start a thread for each character in your array. The threads, all running simultaneously, don't see the output from each other. Instead, each thread reads the value of a[0] (which is 0), and increases it by 1 and stores the resulting value (1). If this is homework, that would have been the basic lesson that the professor wanted to impart.

When multiple threads store a value in the same location simultaneously, it is undefined which thread will get its value stored. In your case, that doesn't matter because all threads that store a value will store the value, "1".

A typical solution would be to have each thread store a value of 0 or 1 in a separate location (depending on if there is a match or not), and then add up the values in a separate step.

You can also use an atomic increase operation.

Upvotes: 2

Related Questions