Reputation: 1307
Using VexCL in C++ I am trying to count all values in a vector above a certain minimum and I would like to perform this count on the device. The default Reductors only provide methods for MIN, MAX and SUM and the examples do not show very clear how to perform such a operation. This code is slow as it is probably executed on the host instead of the device:
int amount = 0;
int minimum = 5;
for (vex::vector<int>::iterator i = vector.begin(); i != vector.end(); ++i)
{
if (*i >= minimum)
{
amount++;
}
}
The vector I am using will consists of a large amount of values, say millions and mostly zero's. Besides the amount of values that are above the minimum, I also would like to retrieve a list of vector-ID's which contains these values. Is this possible?
Upvotes: 0
Views: 288
Reputation: 1307
@ddemidov
Thanks for your help, it is working. However, it is much slower than my original code which copies the device vector to the host and sorts using Boost. Below is the sample code with some timings:
#include <iostream>
#include <cstdio>
#include <vexcl/vexcl.hpp>
#include <vector>
#include <boost/range/algorithm.hpp>
int main()
{
clock_t start, end;
// initialize vector with random numbers
std::vector<int> hostVector(1000000);
for (int i = 0; i < hostVector.size(); ++i)
{
hostVector[i] = rand() % 20 + 1;
}
// copy to device
vex::Context cpu(vex::Filter::Type(CL_DEVICE_TYPE_CPU) && vex::Filter::Any);
vex::Context gpu(vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::Any);
vex::vector<int> vectorCPU(cpu, 1000000);
vex::vector<int> vectorGPU(gpu, 1000000);
copy(hostVector, vectorCPU);
copy(hostVector, vectorGPU);
// sort results on CPU
start = clock();
boost::sort(hostVector);
end = clock();
cout << "C++: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
// sort results on OpenCL
start = clock();
vex::sort(vectorCPU, vex::greater<int>());
end = clock();
cout << "vexcl CPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
start = clock();
vex::sort(vectorGPU, vex::greater<int>());
end = clock();
cout << "vexcl GPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
return 0;
}
which results in:
C++: 17 ms
vexcl CPU: 737 ms
vexcl GPU: 1670 ms
using an i7 3770 CPU and a (slow) HD4650 graphics card. As I'v read OpenCL should be able to perform fast sortings on large vertices. Do you have any advice how to perform a fast sort using OpenCL and vexcl?
Upvotes: 0
Reputation: 1761
If you only needed to count elements above the minimum, this would be as simple as
vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum( vec >= minimum );
The vec >= minimum
expression results in a sequence of ones and zeros, and sum
then counts ones.
Now, since you also need to get the positions of the elements above the minimum, it gets a bit more complicated:
#include <iostream>
#include <vexcl/vexcl.hpp>
int main() {
vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
// Input vector
vex::vector<int> vec(ctx, {1, 3, 5, 2, 6, 8, 0, 2, 4, 7});
int n = vec.size();
int minimum = 5;
// Put result of (vec >= minimum) into key, and element indices into pos:
vex::vector<int> key(ctx, n);
vex::vector<int> pos(ctx, n);
key = (vec >= minimum);
pos = vex::element_index();
// Get number of interesting elements in vec.
vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum(key);
// Sort pos by key in descending order.
vex::sort_by_key(key, pos, vex::greater<int>());
// First 'amount' of elements in pos now hold indices of interesting
// elements. Lets use slicer to extract them:
vex::vector<int> indices(ctx, amount);
vex::slicer<1> slice(vex::extents[n]);
indices = slice[vex::range(0, amount)](pos);
std::cout << "indices: " << indices << std::endl;
}
This gives the following output:
indices: {
0: 2 4 5 9
}
Upvotes: 1