user1745292
user1745292

Reputation: 57

GPU cuda code for array processing, abstract

I have an array of several millions of integer values(input). I would like to perform function F(input[x]) on them individually and separately, using GPU, nvidia gtx 780ti or gtx 980, then have the results array (output) back in main memory, each output element output[x] corresponding to input array element input[x]. F() does not contain any floating point calculations.

How do i organize such a task of this size array(millions of elements) properly for gpu ?

Im looking for a proper GPU substitute to this :

for (int x=0; x<5000000; x++)

output[x] = F(input[x]);

Upvotes: 2

Views: 403

Answers (1)

m.s.
m.s.

Reputation: 16354

In order to provide an answer to this question, I convert the comments into this answer:

Your use case is very easily implemented in CUDA. A very beginner-friendly way to do this is using Thrust.

#include <iostream>
#include <thrust/sequence.h>

#include <thrust/transform.h>
#include <thrust/device_vector.h>

struct F
{
     __device__
     int operator()(int value) const
     {
         // just a dummy function
         return value*value;
     }
};

int main()
{
     const int N = 10;
     thrust::device_vector<int> input(N);
     // filling the input with dummy values
     thrust::sequence(input.begin(), input.end());
     thrust::device_vector<int> output(N);
     thrust::transform(input.begin(), input.end(), output.begin(), F());
     thrust::copy(output.begin(), output.end(), std::ostream_iterator<int>(std::cout, " "));

     return 0;
}

Compiling and running this code yields:

$ nvcc transform.cu && ./a.out

0 1 4 9 16 25 36 49 64 81

Of course, you can also write a very simple, plain CUDA kernel to accomplish this task as Robert suggested.

Upvotes: 2

Related Questions