tiom
tiom

Reputation: 137

c++ improve vector sorting by presorting with old vector

I have a a vector of pair with the following typdef

typedef std::pair<double, int> myPairType;
typedef std::vector<myPairType> myVectorType;
myVectorType myVector;

I fill this vector with double values and the int part of the pair is an index. The vector then looks like this

0.6594 1
0.5434 2
0.5245 3
0.8431 4
...

My program has a number of time steps with slight variations in the double values and every time step I sort this vector with std::sort to something like this.

0.5245 3
0.5434 2
0.6594 1
0.8431 4

The idea is now to somehow use the vector from the last time step (the "old vector, already sorted) to presort the current vector (the new vector, not yet sorted). And use an insertions sort or tim sort to sort the "rest" of the then presorted vector.

Is this somehow possible? I couldn't find a function to order the "new" vector of pairs by one part (the int part). And if it is possible could this be faster then sorting the whole unsorted "new" vector?

Thanks for any pointers into the right direction.

tiom

UPDATE

First of all thanks for all the suggestions and code examples. I will have a look at each of them and do some benchmarking if they will speed up the process.

Since there where some questions regarding the vectors I will try to explain in more detail what I want to accomplish.

As I said I have a number if time steps 1 to n. For every time step I have a vector of double data values with approximately 260000 elements.

In every time step I add an index to this vector which will result in a vector of pairs <double, int>. See the following code snippet.

typedef typename myVectorType::iterator myVectorTypeIterator; // iterator for myVector
std::vector<double> vectorData; // holds the double data values
myVectorType myVector(vectorData.size()); // vector of pairs <double, int>

myVectorTypeIterator myVectorIter = myVector.begin();
// generating of the index
for (int i = 0; i < vectorData.size(); ++i) {
    myVectorIter->first = vectorData[i];
    myVectorIter->second = i;
    ++myVectorIter;
}

std::sort(myVector.begin(), myVector.end() );

(The index is 0 based. Sorry for my initial mistake in the example above) I do this for every time step and then sort this vector of pairs with std::sort.

The idea was now to use the sorted vector of pairs of time step j-1 (lets call it vectorOld) in time step j as a "presorter" for the "new" myVector since I assume the ordering of the sorted "new" myVector of time step j will only differ in some cases from the already sorted vectorOld of time step j-1.

With "presorter" I mean to rearrange the pairs in the "new" myVector into a vector presortedVector of type myVectorType by the same index order as the vectorOld and then let a tim sort or some similar sorting algorithm that is good in presorted date do the rest of the sorting.

Some data examples: This is what the beginning of myVector looks like in time step j-1 before the sorting.

0.0688015 0
0.0832928 1
0.0482259 2
0.142874 3
0.314859 4
0.332909 5
...

And after the sorting

0.000102207 23836
0.000107378 256594
0.00010781 51300
0.000109315 95454
0.000109792 102172
...

So I in the next time step j this is my vectorOld and I like to take the element with index 23836 of the "new" myVector and put it in the first place of the presortedVector, element with index 256594 should be the second element in presortedVector and so on. But the elements have to keep their original index. So 256594 will not be index 0 but only element 0 in presortedVector still with index 256594

I hope this is a better explanation of my plan.

Upvotes: 1

Views: 589

Answers (4)

luk32
luk32

Reputation: 16070

Well you can create new vector with the order of the old and then use algorithms that has good complexity for (nearly) sorted inputs for the restoration of order.

Below I put an example of how it works, with Mark's function as restore_order:

#include <iostream>
#include <algorithm>
#include <vector>
#include <utility>
using namespace std;

typedef std::pair<double, int> myPairType;
typedef std::vector<myPairType> myVectorType;

void outputMV(const myVectorType& vect, std::ostream& out)
{
    for(const auto& element : vect) 
        out << element.first << " " << element.second << '\n';
}

//https://stackoverflow.com/a/28813905/1133179
template<class RandomIt, class Compare>
void restore_order(RandomIt first, RandomIt last, Compare comp)
{
    RandomIt mid = std::is_sorted_until(first, last, comp);
    std::sort(mid, last, comp);
    std::inplace_merge(first, mid, last, comp);
}

int main() {
    myVectorType myVector = {{3.5,0},{1.4,1},{2.5,2},{1.0,3}};
    myVectorType mv2 = {{3.6,0},{1.35,1},{2.6,2},{1.36,3}};
    auto comparer = [] (const auto& lhs, const auto& rhs) { return lhs.first < rhs.first;};

    // make sure we didn't mess with the initial indexing
    int i = 0;
    for(auto& element : myVector) element.second = i++;
    i = 0;
    for(auto& element : mv2) element.second = i++;

    //sort the initial vector
    std::sort(myVector.begin(), myVector.end(), comparer);
    outputMV(myVector, cout);

    // this will replace each element of myVector with a corresponding  
    // value from mv2 using the old sorted order
    std::for_each(myVector.begin(), myVector.end(),
       [mv2] (auto& el) {el = mv2[el.second];}
    );

    // restore order in case it was different for the new vector
    restore_order(myVector.begin(), myVector.end(), comparer);
    outputMV(myVector, cout);
    return 0;
}

This works in O(n) up to the point of restore then. Then the trick is to use good function for it. A nice candidate will have good complexity for nearly sorted inputs. I used function that Mark Ransom posted, which works, but still isn't perfect.

It could get outperformed by bubble sort inspired method. Something like, iterate over each element, if the order between current and next element is wrong recursively swap current and next. However there is a bet on how much the order changes - if the order doesn't vary much you will stay close to O(2n), if does - you will go up to O(n^2).

I think the best would be an implementation of natural merge sort. That has best case (sorted input) O(n), and worst O(n log n).

Upvotes: 0

Mark Ransom
Mark Ransom

Reputation: 308246

First, scan through the sequence to find the first element that's smaller than the preceding one (either a loop, or C++11's std::is_sorted_until). This is the start of the unsorted portion. Use std::sort on the remainder, then merge the two halves with std::inplace_merge.

template<class RandomIt, class Compare>
void sort_new_elements(RandomIt first, RandomIt last, Compare comp)
{
    RandomIt mid = std::is_sorted_until(first, last, comp);
    std::sort(mid, last, comp);
    std::inplace_merge(first, mid, last, comp);
}

This should be more efficient than sorting the whole sequence indiscriminately, as long as the presorted sequence at the front is significantly larger than the unsorted part.

Upvotes: 1

Chris Drew
Chris Drew

Reputation: 15334

I have no idea if this could be faster than sorting the whole unsorted "new" vector. It will depend on the data.

But this will create a sorted copy of a new vector based on the order of an old vector:

myVectorType getSorted(const myVectorType& unsorted, const myVectorType& old) {
  myVectorType sorted(unsorted.size());

  auto matching_value 
    = [&unsorted](const myPairType& value)
    { return unsorted[value.second - 1]; };

  std::transform(old.begin(), old.end(), sorted.begin(), matching_value);
  return sorted;
}

You will then need to "finish" sorting this vector. I don't know how much quicker (if at all) this will be than sorting it from scratch.

Live demo.

Upvotes: 0

sehe
sehe

Reputation: 393164

Using the sorted vector would likely result in more comparisons (just to find a matching item).

What you seem to be looking for is a self-ordering container.

You could use a set (and remove/re-insert on modification).

Alternatively you could use Boost Multi Index which affords a bit more convenience (e.g. use a struct instead of the pair)

Upvotes: 0

Related Questions