Reputation: 11479
I'm looking for an online algorithm for processing more data than I can reasonably store.
I just want to keep data points n
where the value v[n]
is smaller than any later value. (The values are generally increasing.)
The obvious way to do this (not to say the only way, or the right way) is to use a stack. For each new point, pop points off the stack while their values are more than the current point's value and then push the current point onto the stack.
But the data is very sparse. In a quick test only about 3 MB were saved per TB.
Upvotes: 0
Views: 56
Reputation: 350270
You could process the data in chunks. Define the size of a chunk such that the expected result size is guaranteed to fit in it. So if we say ten million of values are considered a chunk, then we are also saying the number of minima will never exceed 10 million. Then proceed as follows:
At the end you'll have the minima at the start of the array.
This can be optimised by stopping the backwards iteration when arriving in the part of the array that contains the previous iteration's result, and the value to compare with is also from that part. The part at the right of the array should then be moved just after this point in the array.
This algorithm could run faster than your stack version, assuming that reading a chunk of input data in an array can be done very fast, and that moving a part of an array to the left can also be done very fast (memcopy type of action).
Upvotes: 1