Reputation: 57
I have an array of several millions of integer values(input). I would like to perform function F(input[x])
on them individually and separately, using GPU, nvidia gtx 780ti or gtx 980, then have the results array (output) back in main memory, each output element output[x]
corresponding to input array element input[x]
. F()
does not contain any floating point calculations.
How do i organize such a task of this size array(millions of elements) properly for gpu ?
Im looking for a proper GPU substitute to this :
for (int x=0; x<5000000; x++)
output[x] = F(input[x])
;
Upvotes: 2
Views: 403
Reputation: 16354
In order to provide an answer to this question, I convert the comments into this answer:
Your use case is very easily implemented in CUDA. A very beginner-friendly way to do this is using Thrust.
#include <iostream>
#include <thrust/sequence.h>
#include <thrust/transform.h>
#include <thrust/device_vector.h>
struct F
{
__device__
int operator()(int value) const
{
// just a dummy function
return value*value;
}
};
int main()
{
const int N = 10;
thrust::device_vector<int> input(N);
// filling the input with dummy values
thrust::sequence(input.begin(), input.end());
thrust::device_vector<int> output(N);
thrust::transform(input.begin(), input.end(), output.begin(), F());
thrust::copy(output.begin(), output.end(), std::ostream_iterator<int>(std::cout, " "));
return 0;
}
Compiling and running this code yields:
$ nvcc transform.cu && ./a.out
0 1 4 9 16 25 36 49 64 81
Of course, you can also write a very simple, plain CUDA kernel to accomplish this task as Robert suggested.
Upvotes: 2