cprogrammer
cprogrammer

Reputation: 5675

Given an array, can I find in O(n) the longest range, whose endpoints are the greatest values in the range?

For a given array of integers, find the maximum distance between 2 points (i and j) that have higher values ​​than any element between them.

Example:

values: 0 10  8  9  6  7  4 10  0
index : 0  1  2  3  4  5  6  7  8 

for the values above the solution is i=1, j=7, but

I can't see a solution in O(n) ... anyone ?

Upvotes: 33

Views: 7204

Answers (7)

Rafał Dowgird
Rafał Dowgird

Reputation: 45131

A simple stack based solution. Iterate over the array left to right, with the stack holding elements (technically, indexes, but use values for comparisons) that are either:

  1. The largest from the left (i.e. no larger or equal element between the start of the array and the element)
  2. The largest since the previous element on the stack.

When processing the next element x, pop elements smaller than x as long as they are from the category 2. above, then push x on the stack. Obviously you need to keep the current max to be able to discern between category 2 and 1 in constant time.

The processing above is O(n) - each element can be pushed at most once and popped at most once. Having the final stack, just check neighboring pairs (on the stack) - one of the pairs is the solution. This too is O(n).

Here's a picture with what should stay on the stack (the fat rectangles) after the whole scan of the array:

Stack-bars

(There's a small mistake in the above picture: fourth bar from the left should either be thick or drawn shorter than the first one, sorry)

Why this works: the final stack contains all elements of the input array that are not between any two larger elements (I've skipped the case of an element between two equal elements) . The solution is obviously a neighboring pair of such elements.

Here's an implementation in Python:

from collections import namedtuple

E = namedtuple('E', 'i x')

def maxrange(iterable):
    stack = [E(0, None)]*2 # push sentinel values
    maxsofar = None

    top = lambda: stack[-1] # peek at the top element on the stack

    for i, x in enumerate(iterable):
        while top().x < x and top().x < maxsofar:
            stack.pop()
        stack.append(E(i, x)) # push
        maxsofar = max(maxsofar, x)

    return max(b.i-a.i for a,b in zip(stack, stack[1:]))

Example:

>>> maxrange([2,1,3])
2

Upvotes: 21

Thomas Ahle
Thomas Ahle

Reputation: 31614

Rafals solution is good, but we can do without the stack and so save some memory. Here is a short and efficient implementation in O(n) time:

def highDist(seq):
    res, ltr, rtl = 0, 0, 0
    for i in range(len(seq)):
        if seq[i] >= seq[ltr]:
            res = max(res, i-ltr)
            ltr = i
        if seq[-i-1] >= seq[-rtl-1]:
            res = max(res, i-rtl)
            rtl = i
    return res

Run on the example input:

>>> print highDist([0, 10, 8, 9, 6, 7, 4, 10, 0])
6
>>> print highDist([0, 10, 8, 9, 6, 7, 4, 9, 0])
4
>>> print highDist([0, 10, 8, 9, 6, 7, 4, 7, 0])
2
>>> print highDist([])
0

The trick is, that if we have two points a and b s.t. everything between them is smaller than them, then the maximum distance we are searching for, is either |b-a| or entirely outside the range. Hence if we partition the entire sequence that way, one of them is the range we are searching for.

We can make the partition easily be creating an upsequence from each end.

Upvotes: 5

cprogrammer
cprogrammer

Reputation: 5675

I have write a solution for the problem. So far looks valid, works with all input I have tried. This is the code in C++. I wanted a solution as clean and simple I could get so didn't use Cartesian tree or stack solution suggested but rather a simpler approach: parsing first time and get the list of valid intervals (as suggested by Rafał Dowgird) and a second time to determine the max len interval that is the solution.

void Solution(const int* vData, int vLen) { typedef std::pair Interval; typedef std::vector ListaIntervaleType; ListaIntervaleType ListaIntervale;

/************************************************************************/ /* Get valid Intervals */ /************************************************************************/ #define IS_LAST_ELEM (i == vLen-1) int iIdxStart = -1; for (int i=1; i < vLen; ++i) { if (-1 == iIdxStart) { /* Searching for Interval START */ if (!IS_LAST_ELEM) { if (vData[i+1] < vData[i]) { iIdxStart = i; } } } else { /* Searching for Interval END */ if (vData[iIdxStart] <= vData[i] || IS_LAST_ELEM) { /* stop condition: crt val > start value*/ ListaIntervale.push_back(std::make_pair(iIdxStart,i)); if (!IS_LAST_ELEM && vData[i+1] < vData[i]) { iIdxStart = i; } else { iIdxStart = -1; } } } } /************************************************************************/ /* Concatenate intervals */ /************************************************************************/ //int iMaxLenIntervalIdx = -1; //int iMaxLenIntervalVal = 0; ListaIntervaleType::iterator crt; //for (crt=ListaIntervale.begin(); crt!=ListaIntervale.end(); crt++) //{ // ListaIntervaleType::iterator next = crt + 1; // if (next != ListaIntervale.end()) // { // if (crt->second == next->first) // { // if (crt->second < crt->first && // crt->second < next->second) // { // //concatenam // crt->second = next->second; // crt = ListaIntervale.erase(next); // } // } // } //} /************************************************************************/ /* Get Max */ /************************************************************************/ ListaIntervaleType::iterator solution = ListaIntervale.begin(); int iMaxLen = 0; for (crt=ListaIntervale.begin(); crt!=ListaIntervale.end(); crt++) { int iCrtLen = crt->second - crt->first; if (iCrtLen > iMaxLen) { iMaxLen = iCrtLen; solution = crt; } } /************************************************************************/ /* Print Solution */ /************************************************************************/ if (solution != ListaIntervale.end()) { wprintf(L"Solution idx=[%d-%d] val=[%d-%d]", solution->first, solution->second, vData[solution->first], vData[solution->second]); } else { wprintf(L"Solution NOT FOUND"); } return;

}

Upvotes: 0

micans
micans

Reputation: 1116

I am not sure whether the following solution is O(n), but it is certainly 'almost O(n)', and also very simple, just a few lines of code. It is based on the following observation. For any pair of indices (i,j), draw an arc between them if the interval [i,j] has the sought-after property. Now observe that it is possible to draw these arcs so that they do not intersect. Then observe that the smallest pairs are (i,i+1) - all of these have the sought property. Next, a bigger pair can always be constructed by contraction of two smaller pairs. This leads to the following structure: Start with the pairs (i,i+1) in a linked list. Go over the linked list repeatedly, and try to contract consecutive links. This algorithm is order-indepedent because of the non-intersection property. Perl code follows.

#!/usr/local/bin/perl -w
use strict;

my @values    = (0, 10, 8, 9, 6, 7, 4, 10, 0);           # original example.
@values       = map { int(100 * rand()) } 1..100;        # random testing.
my $nvalues   = @values;
my @intervals = (1..$nvalues-1);       # this encodes a linked list 0->1, 1->2, N-2->N-1

my $change = 0;
my ($maxi, $maxj) = (0, 1);            # initial solution
my $maxdelta = 1;

do {
   my ($jstart, $j) = (0, $intervals[0]);
   $change = 0;
   while ($j < $nvalues-1) {
      my $jnext = $intervals[$j];
      while ($jnext < $nvalues -1 && $values[$j] == $values[$jnext]) {
          $jnext = $intervals[$jnext];   # lookahead to skip intervals with identical boundaries
      }
      if ($values[$j] < $values[$jstart] && $values[$j] < $values[$jnext]) {
         $intervals[$jstart] = $jnext;                   # contraction step
         print STDERR "contraction to $j $jnext\n";
         $change = 1;
         $j = $jnext;
         if ($jnext - $jstart > $maxdelta) {
            $maxdelta = $jnext - $jstart;
            ($maxi, $maxj) = ($jstart, $jnext);
         }
      }
      else {
         ($jstart, $j) = ($j, $intervals[$j]);
      }
   }
   print STDERR "---\n";
}
while ($change);


my $jstart = 0;

while ($jstart < $nvalues -1) {
   my $jnext = $intervals[$jstart];
   local $, = " ";
   print STDERR @values[$jstart..$jnext], "\n";
   $jstart = $jnext;
}

print STDERR "solution $maxi $maxj\n";

Upvotes: 0

ACEG
ACEG

Reputation: 2041

ETA: Found some errors, sorry. I'll get back at this after work, it is definitely an intriguing problem.

Written in sort of pseudocode; the idea is to move a window of three numbers over the array and perform certain comparisons, then update your left and right positions accordingly.

// Define the initial window
w1=a[0], w2=a[1], w3=a[2]
// Will hold the length of the current maximum interval
delta_max = 0

// Holds the value of the current interval's leftmost value
left=max(w1,w2,w3)
// Holds the position of the current interval's leftmost value
lpos= *position of the chosen wi above*

// Holds the position of the current interval's rightmost value
rpos = lpos + 1
// Holds the value of the current interval's rightmost value
right = a[rpos]

i = 0
// Holds the position of the max interval's leftmost value
lmax = 0
// Holds the position of the max interval's rightmost value
rmax = 0

while (i<n-3) do
    i = i + 3
    w1=a[i], w2=a[i+1], w3=a[i+2]

    if (w1<left) and (w2<left) and (w3<left)
        right = w3
        rpos = i + 2
    else
        // Set the new left to the first value in the window that is bigger than the current left
        if (w1>left): left = w1, lpos = i
        else
            if (w2>left): left = w2, lpos = i+1
            else
                if (w3>left): left = w3, lpos = i+2


    delta = rpos-lpos
    if delta>dmax: dmax = delta, lmax = lpos, rmax = rpos
    lpos = rpos
    rpos = lpos + 1
    right = a[rpos]
end

Upvotes: 0

Artem Volkhin
Artem Volkhin

Reputation: 1372

The linear solution is quiet simple. We will use two pointer, for endings of segment, such that every element on it is not more than the the first element. For the fixed first element (first pointer) we move second pointer right until it points to smaller or equal to the first element. If element at second pointer is great or equal to the first, we can update answer with current segment length. And if it points to an element strictly greater than first one, we have to move first pointer to current position, because no mo segment can start at it's last position - current element will be more than segment's endpoint.

This algorithm find maximum by length segment with left element less or equal to right element, so we need to repeat same actions one more time - walking from right to left.

Complexity - O(n), just moving n-1 times second pointer and no more than n-1 times first pointer.

If my idea is not clear ask any questions.

Upvotes: 0

templatetypedef
templatetypedef

Reputation: 373002

I believe that the following algorithm runs in O(n) time and produces an optimal solution. It's a bit tricky, so I'll split the logic into two cases, one in which we consider the case where there are no duplicate values in the array, and one in which we considered a case where duplicates are allowed.

The key data structure we'll be using is the Cartesian tree, which is a max-heap built out of a range of data with the property that an inorder walk of the nodes of the tree produce the values of the sequence in the order in which they appear. The critical detail is that it's possible to build a Cartesian tree for a sequence of elements in O(n) time.

As an example, given the sequence 4 1 0 3 2, we would get this Cartesian tree:

        4
         \
          3
         / \
        1   2
         \
          0

Notice that the inorder walk does indeed give back 4 1 0 3 2.

I now claim that the endpoints of the range we're looking for must be a parent/child pair in the Cartesian tree (that is, either the first node is a parent of the second node or vice-versa). Note that we are not looking for a node and any ancestor, but specifically a parent/child pair. To prove this, I prove the following two claims:

Claim 1: Any parent/child pair in the Cartesian tree defines a range of values in the original sequence that does not have any intermediary values greater than the endpoints.

Claim 2: Any range of values in the sequence that does not have any intermediary values greater than its endpoints must have those endpoints as a parent/child pair in the Cartesian tree.

Taken together, these collectively prove that the range we're looking for must be a parent/child pair in the Cartesian tree. This then gives us an easy linear-time algorithm for solving the problem when all the values are distinct. First, in O(n) time, build up a Cartesian tree for the range. Next, recursively explore the tree, and for each parent/child pair, find the distance between those pairs, taking the largest range we find. This second step also runs in O(n), since a Cartesian tree is a binary tree and so the number of edges is O(n).

Proof of claim one: A parent/child pair must look like either

 u
  \
   v
  / \
 A   B

or

    u
   /
  v
 / \
A   B

Recall that a Cartesian tree stores its elements in a way that an inorder traversal of the elements of the tree yield the elements of the array used to produce the tree in the order in which they appear in the array. This means that in case (1), we're looking at a range of elements starting with u, containing all of A, and concluding with b. Similarly, in case (2), the range starts with v, then contains all of B, then terminates with u. We prove the claim that u and v have no values in-between them that are larger than either u or v by contradiction. Suppose this were the case and that the value w is bigger than either u or v. By the definition of a Cartesian tree, if w is in-between u and v in the original sequence, then in case (1) it must be in subtree A and in case (2) it must be in subtree B. But a Cartesian tree is a max-heap, and so in case (1) both u and v are bigger than everything in A, and in case (2) both u and v are bigger than everything in B, a contradiction. Thus the range of values between any parent/child pair must be no larger than either the parent or the child.

Proof of Claim 2: By contrapositive; we show that if there is a subsequence of the original array beginning with u and ending with v that contains an element larger w than either u or v, then u and v cannot be a parent/child or child/parent pair in the Cartesian tree. The proof is virtually identical to above - if u and v were a parent/child pair, then in case (1) above w would have to be in A and in case (2) w would have to be in B, in both cases violating the fact that a Cartesian tree is a max-heap.

The above proofs show us that if all the values are distinct, we can solve this problem in linear time by just building a Cartesian tree and doing a simple tree walk. But what happens if the elements are allowed to be duplicates, as in your original problem statement? In that case, we can use a modified Cartesian tree structure. We'll allow nodes to be combined into a "metanode" of multiple different Cartesian tree values that share the same value. For example, the sequence 2 7 1 7 8 0 3 8 2 would look like this:

                  [8 ------- 8]
                  / \         \
                 /   3         2
                /   /
               /   0
              /
          [7 -- 7]
            / \
           2   1

Here, the root is a metanode consisting of two nodes with value 8, and the first 8 metanode consists of two 7 metanodes.

For notational purposes, let the "leftest" node of a metanode be the node furthest to the left in the metanode, and let the "rightest" node of a metanode be the node furthest to the right in a metanode.

In this modified Cartesian tree, we can define an inorder walk of the nodes as one that visits all of the nodes in a metanode in left-to-right order, doing an inorder walk of each of the nodes it contains. We then enforce that the modified Cartesian tree a strict max-heap (every node must be strictly greater than any of its children) and that an inorder walk of the tree yields the values in the same order in which they appear in the original sequence.

Notice that this generalized construction contains our original solution as a special case - if all the values are distinct, nothing is different in this tree structure. I'll also state without proof that it's possible to build up one of these structures in O(n) by a straightforward modification of the standard Cartesian tree algorithm.

In this modified tree structure, a range of values with no intervening values at least as large as the endpoints corresponds to either:

  1. A parent and the rightest node of its left child.
  2. A parent and the leftest node of its right child.
  3. Two adjacent nodes in the same metanode.

The proof of the first two claims is pretty much a straightforward modification of the proof for the above case. The difference is that instead of the contradiction proof saying that it would violate the max-heap property, instead you would claim that it violates the strong max-heap property. You would also say that any equal values in the middle of the range would have to manifest themselves as nodes that would prevent the range endpoints from being leftest or rightest nodes in a metanode. The proof of the third claim is (roughly) that any nodes in-between the two nodes must be strictly smaller than the endpoints, and if there were an equal value in the middle then the two nodes wouldn't be adjacent metanodes.

Given this observation, we can solve the original problem in O(n) time as follows. First, build up a generalized Cartesian tree from the input range. Next, walk the tree and look at all of the indicated pairs of elements that might be the range, picking the largest one that you find. Since for each node only a constant number of other nodes need to be checked (its parent, left and right siblings, and children give at most five nodes to check), this can be done in O(n) time for a net runtime of O(n).

Whew! That was an awesome problem. I don't know if this is an ideal solution, but it at least proves that it's possible to do this in O(n) time. If someone comes up with a better answer, I'd love to see it.

Upvotes: 21

Related Questions