Reputation: 6459
The name says it all, but to elaborate I have a list of vectors with timestamps. They are coming in mostly-sorted but will have some out of order values. I want output them in an ordered manner, but the vectors will be coming in streaming and I don't want to large a buffer as I want to output my results in a timely manner.
So I want to keep a sort of 'look ahead' list with N vectors in it. As I read in new vectors I want to insert it into the list and then pop the oldest vector from the top of the list to output, so that the list stays a constant N vectors long.
When I insert into the list I want the vector to be sorted and added at the correct location within the list, as I assume this is the most efficent method.
I need good efficency, but don't want to waste too long implementing and testing. So I'm interested both in easy solutions (such as reusing existing C++ structures if they exist) as well as harder to implement solutions if they can give a noticable speed boost. I would prefer to stick to standard C++, but if there is a boost or similar library that does exactly what I need I would love to hear about it just in case.
Thank you.
EDIT: I appreciate all suggestions. However, I neglected to state that timestamps are not unique. The timestamps have only second precision, so it's actually very likely that I get multiple vectors with the same timestamp. In this case I would prefer to preserve their order, though it's not necessary.
Upvotes: 3
Views: 702
Reputation: 2909
If you were to do it in one big buffer with one big sort, Timsort would be excellent. It's able to take advantage of partial sorting. But you said you don't need that.
If you need things to remain manageable without a sort inside a loop, you're better off with something like a treap or red-black tree.
Treaps are fast on average (I recently did a performance comparison of tree datastructures in Python under a number of different conditions, and found that treaps were always either fastest or second fastest on average - two others were sometimes a little faster than treaps depending on workload, but not consistently so)
Red-black trees reportedly give operation times with low standard deviations (they're kinda slow compared to a treap on average, but if this is a real time or interactive app, the red-black tree might be better for its low operation time variabilty).
Upvotes: 0
Reputation: 7744
Take a look at std::multiset
class.
You should check its insert methods:
#include <set>
#include <functional>
const size_t max_item_number = 100;
struct your_type
{
std::string str;
time_t datetime;
};
class your_less : std::binary_function<your_type,your_type,bool>
{
public:
bool operator()( const your_type &left, const your_type &right ) const
{
return ( left.datetime < right.datetime );
}
};
std::multiset<your_type,your_less> store;
std::multiset<your_type,your_less>::iterator helper = store.begin();
helper = store.insert( helper, new_value );
helper = store.insert( helper, new_value );
// fixed size: remove the oldest value
// you could use it e.g. in loop
if ( store.size() == max_item_number )
{
store.erase( store.begin() );
helper = store.begin();
}
In this way if the stream is ordered, the insert time could be constant.
Upvotes: 3
Reputation: 2394
Easy option: priority_queue O(lg n) insert and extract min and a lot faster than set/multiset (apx. 3times for integers) and has smaller memory footprint
If the input is almost sorted, than you can use some variation of insert sort. You just keep sorted deque and insert things somewhere back and pop mins from front.
Upvotes: 1