ksm001
ksm001

Reputation: 4032

CUDA: Sorting a vector< vector<int> > on the GPU

I have implemented my own comparator for STL's sort function, that helps sorting a std::vector< std::vector<int> > on the CPU. The user gives as an input a std::vector< std::vector<int> > and also a string variable like for example 021. By having this string the sorting is done first on the first column, then on the third column and then on the second column. Example:

  1 2 3
  3 2 1
  1 1 1
  1 1 2

Let's say the string is 10

The output will be

  1 1 1
  1 1 2
  1 2 3
  3 2 1

My CPU implementation uses a class called Sorting, this class is implemented with the following two files:

Sorting.h

class Sorting{

   private:

   public:

Sorting();
~Sorting();
std::vector<std::vector<int>> applySort(std::vector<std::vector<int>>
    data,const std::string& attr);

 };

Sorting.cpp

 Sorting::Sorting(){}

 Sorting::~Sorting(){}


 std::vector<std::vector<int>> Sorting::applySort(
      std::vector<std::vector<int>> data, const std::string& attr){

         std::sort(data.begin(), data.begin()+data.size(), Comparator(attr));

         return data;
  }

Comparator.h

   class Comparator{

   private:
     std::string attr;

   public:
     Comparator(const std::string& attr) { this->attr = attr; }

     bool operator()(const std::vector<int>& first, const std::vector<int>&
          second){

    size_t i;
    for(i=0;i<attr.size();i++){
        if(first[attr.at(i) - '0'] < second[attr.at(i) - '0']) return true;
        else if(first[attr.at(i) - '0'] > second[attr.at(i)-'0'])          
             return  false;
    }
    return false;
  }

  };

My implementation has been tested and it works properly. I'm interested in doing a similar CUDA implementation that would take advantage of GPU's capabilities in order to produce the output much more faster.

Initially I thought because my goal is a bit confusing, maybe changing an already known implementation for sorting in the GPU would do my job. However I started searching many implementations like the one described here: http://blogs.nvidia.com/2012/09/how-tesla-k20-speeds-up-quicksort-a-familiar-comp-sci-code/ and it made me realize that it would be something difficult to achieve.

I'm not sure if this is the best course of action. I started searching for libraries and found Thrust. However, although Thrust allows you to define your own comparator, in a question I asked yesterday I learned that it's not possible to create a host_vector < host_vector<int> >.

And I guess transforming my vector of vectors to a single vector wouldn't help me that much because I have no idea how I would then have to implement my comparator class.

I would like to hear your opinion on this issue:

  1. how should I approach this problem?
  2. Is it possible to achieve it with Thrust?
  3. Would doing it in the GPU give a much better performance on my overall code? Note that the vector of vectors can be huge, millions of rows but only a few(5-10) columns.
  4. Would it be better to design my own sort or change a sort function that is already available? This although sounds like a good idea, in practice I feel like it would take me a lot of effort to achieve. Using a simple comparator and a sort function from a library would be the best for me, however the limitations of Thrust don't allow me to do so.

Thank you in advance

Upvotes: 2

Views: 2447

Answers (2)

Asif Ali
Asif Ali

Reputation: 26

i have found a reasonable method which can finally beat the CPU (not in terms of time but in terms of data elements) actually my new method involves using of thrust::mismatch and i am attaching code for the function

The good thing about this version is that running time of this function is 2ms approx. with very large amount of data such as N = 1000000 to N = 1000, anyways i am posting the function code and do let me know if you find any of user find some other improvements which can reduce the overall running time

template<typename Ivec>    
int lexoMM(Ivec vec, int bv1, int bv2, int N){

  typedef thrust::device_vector<int>::iterator Iterator;
  thrust::pair<Iterator,Iterator> result;

  result = thrust::mismatch(vec.begin()+bv1, vec.begin()+(bv1+N-1), vec.begin()+bv2);

if(result.first == vec.end()){
      //cout<<"Both are equal (right order)"<<endl;
      return 1;
  }
else if(result.first>result.second){
    //cout<<"Wrong order"<<endl;
    //cout<<*result.first<<","<<*result.second;
    return 0;
}
else{
    //cout<<"Right order"<<endl;
    //cout<<*result.first<<","<<*result.second;
    return 1;
    }

}

PS: i feel like i really wasted my time in order to implementing my own version of this same thing, but thrust is awsome :)

Upvotes: 0

Asif Ali
Asif Ali

Reputation: 26

i see that you are trying to implement a lexicographical sorting technique(but i have did it with single 1D huge vector), well i have been there and i have implemented a function which sorts the vectors but actually it is lagging way behind the lexicographical sorting, anyways i am not sure if i can post the code here, so if you need any help i would be glad to help

PS: look into the implementation of lexicographical_sort.cu in thrust example code (i have tweaked it also but that one also is lagging behind) The comparator function you might need in order to check from two distintive places in 1D vector (which contains all the data) is listed down (by the way, this technique is way slower then CPU) but who know you might come up with idea to improve it or use it better then i do

struct arbitrary_functor
{
    template <typename Tuple>   __host__ __device__
        void operator()(Tuple t)
        {
            if(thrust::get<0>(t)>thrust::get<1>(t))
                thrust::get<2>(t) = 1;
            else
                thrust::get<2>(t) = 0;
        }
};

int checkLexo_1vec(const thrust::device_vector<char> & A, int bv1, int bv2, int N){
    int i;

    thrust::device_vector<char> temp(N);
    thrust::device_vector<char> sum(N);

    thrust::for_each(thrust::make_zip_iterator(
                        thrust::make_tuple(A.begin()+bv2, A.begin()+bv1,temp.begin())),
                    thrust::make_zip_iterator(
                        thrust::make_tuple(A.end()+(bv2+N),A.end()+(bv1+N), temp.end())),
                    arbitrary_functor());


    thrust::inclusive_scan(temp.begin(),temp.end(),sum.begin());

    int a = thrust::lower_bound(sum.begin(),sum.end(),1) - sum.begin();

    thrust::for_each(thrust::make_zip_iterator(
                        thrust::make_tuple(A.begin()+bv1, A.begin()+bv2, temp.begin())),
                    thrust::make_zip_iterator(
                        thrust::make_tuple(A.end()+(bv1+N), A.end()+(bv2+N),temp.end())),
                    arbitrary_functor());

    thrust::inclusive_scan(temp.begin(),temp.end(),sum.begin());

    int b = thrust::lower_bound(sum.begin(),sum.end(),1) - sum.begin();

    if(a<=b)
        return 1;
    else
        return 0;
}

Upvotes: 1

Related Questions