Charles
Charles

Reputation: 11479

Data structure for avoiding frequent pushing/popping when searching for successive minima

I'm looking for an online algorithm for processing more data than I can reasonably store.

I just want to keep data points n where the value v[n] is smaller than any later value. (The values are generally increasing.)

The obvious way to do this (not to say the only way, or the right way) is to use a stack. For each new point, pop points off the stack while their values are more than the current point's value and then push the current point onto the stack.

But the data is very sparse. In a quick test only about 3 MB were saved per TB.

Upvotes: 0

Views: 56

Answers (1)

trincot
trincot

Reputation: 350270

You could process the data in chunks. Define the size of a chunk such that the expected result size is guaranteed to fit in it. So if we say ten million of values are considered a chunk, then we are also saying the number of minima will never exceed 10 million. Then proceed as follows:

  • Reserve an array for storing 10 million values
  • As long as there is more data, keep repeating the following steps
  • Populate the free part of the array with input values
  • Go backwards through the whole array to find the minima. As you noted this can be done without stack. It can be done inplace in the array, by saving the found minima at the right side of the array.
  • Move the those minima to the start of the array, leaving a free part at the right side of the array, which can be populated in the next iteration with new input values.

At the end you'll have the minima at the start of the array.

This can be optimised by stopping the backwards iteration when arriving in the part of the array that contains the previous iteration's result, and the value to compare with is also from that part. The part at the right of the array should then be moved just after this point in the array.

This algorithm could run faster than your stack version, assuming that reading a chunk of input data in an array can be done very fast, and that moving a part of an array to the left can also be done very fast (memcopy type of action).

Upvotes: 1

Related Questions