Reputation: 725
I have a HUGE table (about 50Gb) in (i,j,k) format (from a sparse matrix) stored as
uint32_t * idx1, * idx2;
float * vals;
uint32_t tablesize;
and I'd like to sort it in place with a given comparison function that's a function of idx1 and idx2. Can this be done using std::sort?
Specifically, each nonzero entry (i,j) with value v in the sparse matrix is stored by placing i in idx1, j in idx2, and v in the corresponding entry in vals. I'd like to then sort these entries according to (i1, j1, v1) <= (i2, j2, v2) if
(i1 < i2) || (i1==i2 && j1 <= j2)
The examples I've been able to scrounge up of using std::sort on nonstandard datatypes assume that each item being compared is a single instance of a class; here each item is represented by three values in different arrays.
Upvotes: 8
Views: 1565
Reputation: 70516
If you have to keep using your existing data structure, which is essentially a std::tuple
of three std::vector
s, using boost::zip_iterator
would seem to be the way to go. A zip_iterator
treats three iterators (two to indices and one to a value) as a single tuple, and you can use a custom comparison function object to sort your data in-place. Alas, boost::zip_iterator
can't be used with std::sort
, as explained in this Q&A, because it can't be written into.
This means that you would have to write your own zip_iterator class that can be used with std::sort
. Note that it is not a trivial exercise, see this Q&A and/or this paper.
It is a lot easier to sort a std::vector
of a std::tuple
. My attempt below uses a std::tuple
of two indices and a value, and stores those entries into a std::vector
. For the sorting, I use a C++14 generic lambda that forwards the two indices into a smaller tuple and compares those lexicographically (i.e. first on row-index, then on column-index) using the library operator<
of std::tuple
.
#include <algorithm>
#include <iostream>
#include <tuple>
#include <vector>
using index = uint32_t;
using value = float;
using sparse_entry = std::tuple<index, index, value>;
using sparse_matrix = std::vector<sparse_entry>;
int main()
{
// sparse 3x3 matrix
auto m = sparse_matrix {
std::make_tuple( 1, 1, -2.2),
std::make_tuple( 1, 0, 42 ),
std::make_tuple( 0, 2, 3.4),
std::make_tuple( 0, 1, 1.7)
};
// sort by row-index, then column-index
std::sort(begin(m), end(m), [](auto const& L, auto const& R) {
return
std::forward_as_tuple(std::get<0>(L), std::get<1>(L)) <
std::forward_as_tuple(std::get<0>(R), std::get<1>(R))
;
});
for (auto const& elem : m)
std::cout << "{ " << std::get<0>(elem) << ", " << std::get<1>(elem) << ", " << std::get<2>(elem) << "}, \n";
}
If your application can use this transformed data layout (and there maybe cache-performance reasons why it can't), then the above code will do the sorting as you want it.
NOTE: as @Casey mentions, you can also use std::tie
instead of std::forward_as_tuple
, but that can bite you when you change sparse_entry
into a full-fledged user-defined class with getters returning by-value.
Upvotes: 1
Reputation: 16290
It is unfortunately quite difficult to convince std::sort
, or any of the standard library, to work with striped data. It is designed to assume that data can be copied via a single =
, moved via one move
or swapped via one swap
.
Your best bet is to use boost::iterator_facade
to write a custom iterator class which wraps the data, and hides the striped data format from std::sort
. I've wanted to do something similar in the past but my workspace does not allow us to use boost
. EDIT: when your facade is dereferenced, it will probably need to create some sort of proxy object that can be assigned/moved/swapped and will do the right thing to each of the stripe arrays. It's not trivial.
The next best bet is to make an array of int
s from zero to N, each one representing an index into your striped data array. Write a custom functor to std::sort
which sorts this array to match your criteria. It's obviously far from ideal when you have such a large data set.
Upvotes: 3