Reputation: 5789
The program below (well, the lines after "from here") is a construct i have to use a lot. I was wondering whether it is possible (eventually using functions from the eigen library) to vectorize or otherwise make this program run faster.
Essentially, given a vector of float
x
, this construct has recover the indexes
of the sorted elements of x
in a int
vector SIndex
. For example, if the first
entry of SIndex
is 10, it means that the 10th element of x
was the smallest element
of x
.
#include <algorithm>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
using std::vector;
using namespace std;
typedef pair<int, float> sortData;
bool sortDataLess(const sortData& left, const sortData& right){
return left.second<right.second;
}
int main(){
int n=20,i;
float LO=-1.0,HI=1.0;
srand (time(NULL));
vector<float> x(n);
vector<float> y(n);
vector<int> SIndex(n);
vector<sortData> foo(n);
for(i=0;i<n;i++) x[i]=LO+(float)rand()/((float)RAND_MAX/(HI-LO));
//from here:
for(i=0;i<n;i++) foo[i]=sortData(i,x[i]);
sort(foo.begin(),foo.end(),sortDataLess);
for(i=0;i<n;i++){
sortData bar=foo[i];
y[i]=x[bar.first];
SIndex[i]=bar.first;
}
for(i=0;i<n;i++) std::cout << SIndex[i] << std::endl;
return 0;
}
Upvotes: 1
Views: 176
Reputation: 36497
There's no getting around the fact that this is a sorting problem, and vectorization doesn't necessarily improve sorts very much. For example, the partition step of quicksort can do the comparison in parallel, but it then needs to select and store the 0–n values that passed the comparison. This can absolutely be done, but it starts throwing out the advantages you get from vectorization—you need to convert from a comparison mask to a shuffle mask, which is probably a lookup table (bad), and you need a variable-sized store, which means no alignment (bad, although maybe not that bad). Mergesort needs to merge two sorted lists, which in some cases could be improved by vectorization, but in the worst case (I think) needs the same number of steps as the scalar case.
And, of course, there's a decent chance that any major speed boost you get from vectorization will have already been done inside your standard library's std::sort
implementation. To get it, though, you'd need to be sorting primitive types with the default comparison operator.
If you're worried about performance, you can easily avoid the last loop, though. Just sort a list of indices using your float array as a comparison:
struct IndirectLess {
template <typename T>
IndirectLess(T iter) : values(&*iter) {}
bool operator()(int left, int right)
{
return values[left] < values[right];
}
float const* values;
};
int main() {
// ...
std::vector<int> SIndex;
SIndex.reserve(n);
for (int i = 0; i < n; ++i)
SIndex.push_back(n);
std::sort(SIndex.begin(), SIndex.end(), IndirectLess(x.begin()));
// ...
}
Now you've only produced your list of sorted indices. You have the potential to lose some cache locality, so for really big lists it might be slower. At that point it might be possible to vectorize your last loop, depending on the architecture. It's just data manipulation, though—read four values, store 1st and 3rd in one place and 2nd and 4th in another—so I wouldn't expect Eigen to help much at that point.
Upvotes: 1