AatG
AatG

Reputation: 725

sorting table in place using stl sort

I have a HUGE table (about 50Gb) in (i,j,k) format (from a sparse matrix) stored as

uint32_t * idx1, * idx2;
float * vals;
uint32_t tablesize;

and I'd like to sort it in place with a given comparison function that's a function of idx1 and idx2. Can this be done using std::sort?

Specifically, each nonzero entry (i,j) with value v in the sparse matrix is stored by placing i in idx1, j in idx2, and v in the corresponding entry in vals. I'd like to then sort these entries according to (i1, j1, v1) <= (i2, j2, v2) if

(i1 < i2) || (i1==i2 && j1 <= j2)

The examples I've been able to scrounge up of using std::sort on nonstandard datatypes assume that each item being compared is a single instance of a class; here each item is represented by three values in different arrays.

Upvotes: 8

Views: 1565

Answers (2)

TemplateRex
TemplateRex

Reputation: 70516

If you have to keep using your existing data structure, which is essentially a std::tuple of three std::vectors, using boost::zip_iterator would seem to be the way to go. A zip_iterator treats three iterators (two to indices and one to a value) as a single tuple, and you can use a custom comparison function object to sort your data in-place. Alas, boost::zip_iterator can't be used with std::sort, as explained in this Q&A, because it can't be written into.

This means that you would have to write your own zip_iterator class that can be used with std::sort. Note that it is not a trivial exercise, see this Q&A and/or this paper.

It is a lot easier to sort a std::vector of a std::tuple. My attempt below uses a std::tuple of two indices and a value, and stores those entries into a std::vector. For the sorting, I use a C++14 generic lambda that forwards the two indices into a smaller tuple and compares those lexicographically (i.e. first on row-index, then on column-index) using the library operator< of std::tuple.

#include <algorithm>
#include <iostream>
#include <tuple>
#include <vector>

using index = uint32_t;
using value = float;
using sparse_entry = std::tuple<index, index, value>;
using sparse_matrix = std::vector<sparse_entry>;

int main()
{
    // sparse 3x3 matrix
    auto m = sparse_matrix { 
        std::make_tuple( 1, 1, -2.2), 
        std::make_tuple( 1, 0, 42  ), 
        std::make_tuple( 0, 2,  3.4), 
        std::make_tuple( 0, 1,  1.7) 
    };    

    // sort by row-index, then column-index
    std::sort(begin(m), end(m), [](auto const& L, auto const& R) {
        return 
            std::forward_as_tuple(std::get<0>(L), std::get<1>(L)) <
            std::forward_as_tuple(std::get<0>(R), std::get<1>(R))
        ;
    });

    for (auto const& elem : m)
        std::cout << "{ " << std::get<0>(elem) << ", " << std::get<1>(elem) << ", " << std::get<2>(elem) << "}, \n";
}

Live Example.

If your application can use this transformed data layout (and there maybe cache-performance reasons why it can't), then the above code will do the sorting as you want it.

NOTE: as @Casey mentions, you can also use std::tie instead of std::forward_as_tuple, but that can bite you when you change sparse_entry into a full-fledged user-defined class with getters returning by-value.

Upvotes: 1

StilesCrisis
StilesCrisis

Reputation: 16290

It is unfortunately quite difficult to convince std::sort, or any of the standard library, to work with striped data. It is designed to assume that data can be copied via a single =, moved via one move or swapped via one swap.

Your best bet is to use boost::iterator_facade to write a custom iterator class which wraps the data, and hides the striped data format from std::sort. I've wanted to do something similar in the past but my workspace does not allow us to use boost. EDIT: when your facade is dereferenced, it will probably need to create some sort of proxy object that can be assigned/moved/swapped and will do the right thing to each of the stripe arrays. It's not trivial.

The next best bet is to make an array of ints from zero to N, each one representing an index into your striped data array. Write a custom functor to std::sort which sorts this array to match your criteria. It's obviously far from ideal when you have such a large data set.

Upvotes: 3

Related Questions