Reputation: 4032
I have implemented my own comparator for STL's sort
function, that helps sorting a std::vector< std::vector<int> >
on the CPU.
The user gives as an input a std::vector< std::vector<int> >
and also a string variable like for example 021
. By having this string the sorting is done first on the first column, then on the third column and then on the second column. Example:
1 2 3
3 2 1
1 1 1
1 1 2
Let's say the string is 10
The output will be
1 1 1
1 1 2
1 2 3
3 2 1
My CPU implementation uses a class called Sorting
, this class is implemented with the following two files:
Sorting.h
class Sorting{
private:
public:
Sorting();
~Sorting();
std::vector<std::vector<int>> applySort(std::vector<std::vector<int>>
data,const std::string& attr);
};
Sorting.cpp
Sorting::Sorting(){}
Sorting::~Sorting(){}
std::vector<std::vector<int>> Sorting::applySort(
std::vector<std::vector<int>> data, const std::string& attr){
std::sort(data.begin(), data.begin()+data.size(), Comparator(attr));
return data;
}
Comparator.h
class Comparator{
private:
std::string attr;
public:
Comparator(const std::string& attr) { this->attr = attr; }
bool operator()(const std::vector<int>& first, const std::vector<int>&
second){
size_t i;
for(i=0;i<attr.size();i++){
if(first[attr.at(i) - '0'] < second[attr.at(i) - '0']) return true;
else if(first[attr.at(i) - '0'] > second[attr.at(i)-'0'])
return false;
}
return false;
}
};
My implementation has been tested and it works properly. I'm interested in doing a similar CUDA implementation that would take advantage of GPU's capabilities in order to produce the output much more faster.
Initially I thought because my goal is a bit confusing, maybe changing an already known implementation for sorting in the GPU would do my job. However I started searching many implementations like the one described here: http://blogs.nvidia.com/2012/09/how-tesla-k20-speeds-up-quicksort-a-familiar-comp-sci-code/ and it made me realize that it would be something difficult to achieve.
I'm not sure if this is the best course of action. I started searching for libraries and found Thrust
. However, although Thrust allows you to define your own comparator, in a question I asked yesterday I learned that it's not possible to create a host_vector < host_vector<int> >
.
And I guess transforming my vector of vectors to a single vector wouldn't help me that much because I have no idea how I would then have to implement my comparator class.
I would like to hear your opinion on this issue:
Thrust
?Thrust
don't allow me to do so.Thank you in advance
Upvotes: 2
Views: 2447
Reputation: 26
i have found a reasonable method which can finally beat the CPU (not in terms of time but in terms of data elements) actually my new method involves using of thrust::mismatch and i am attaching code for the function
The good thing about this version is that running time of this function is 2ms approx. with very large amount of data such as N = 1000000 to N = 1000
, anyways i am posting the function code and do let me know if you find any of user find some other improvements which can reduce the overall running time
template<typename Ivec>
int lexoMM(Ivec vec, int bv1, int bv2, int N){
typedef thrust::device_vector<int>::iterator Iterator;
thrust::pair<Iterator,Iterator> result;
result = thrust::mismatch(vec.begin()+bv1, vec.begin()+(bv1+N-1), vec.begin()+bv2);
if(result.first == vec.end()){
//cout<<"Both are equal (right order)"<<endl;
return 1;
}
else if(result.first>result.second){
//cout<<"Wrong order"<<endl;
//cout<<*result.first<<","<<*result.second;
return 0;
}
else{
//cout<<"Right order"<<endl;
//cout<<*result.first<<","<<*result.second;
return 1;
}
}
PS: i feel like i really wasted my time in order to implementing my own version of this same thing, but thrust
is awsome :)
Upvotes: 0
Reputation: 26
i see that you are trying to implement a lexicographical sorting technique(but i have did it with single 1D huge vector), well i have been there and i have implemented a function which sorts the vectors but actually it is lagging way behind the lexicographical sorting, anyways i am not sure if i can post the code here, so if you need any help i would be glad to help
PS: look into the implementation of lexicographical_sort.cu in thrust example code (i have tweaked it also but that one also is lagging behind) The comparator function you might need in order to check from two distintive places in 1D vector (which contains all the data) is listed down (by the way, this technique is way slower then CPU) but who know you might come up with idea to improve it or use it better then i do
struct arbitrary_functor
{
template <typename Tuple> __host__ __device__
void operator()(Tuple t)
{
if(thrust::get<0>(t)>thrust::get<1>(t))
thrust::get<2>(t) = 1;
else
thrust::get<2>(t) = 0;
}
};
int checkLexo_1vec(const thrust::device_vector<char> & A, int bv1, int bv2, int N){
int i;
thrust::device_vector<char> temp(N);
thrust::device_vector<char> sum(N);
thrust::for_each(thrust::make_zip_iterator(
thrust::make_tuple(A.begin()+bv2, A.begin()+bv1,temp.begin())),
thrust::make_zip_iterator(
thrust::make_tuple(A.end()+(bv2+N),A.end()+(bv1+N), temp.end())),
arbitrary_functor());
thrust::inclusive_scan(temp.begin(),temp.end(),sum.begin());
int a = thrust::lower_bound(sum.begin(),sum.end(),1) - sum.begin();
thrust::for_each(thrust::make_zip_iterator(
thrust::make_tuple(A.begin()+bv1, A.begin()+bv2, temp.begin())),
thrust::make_zip_iterator(
thrust::make_tuple(A.end()+(bv1+N), A.end()+(bv2+N),temp.end())),
arbitrary_functor());
thrust::inclusive_scan(temp.begin(),temp.end(),sum.begin());
int b = thrust::lower_bound(sum.begin(),sum.end(),1) - sum.begin();
if(a<=b)
return 1;
else
return 0;
}
Upvotes: 1