Mickey
Mickey

Reputation: 1515

Finding median from two sorted arrays with different lengths

The problem of finding the median from a two given sorted arrays of the same length is pretty known and easy (and asked here many times before). (This can be done by a simple recursive algorithm)

My question is how to find the median when the two arrays are not of the same length efficiently (i.e not sorting them both with mergesort and finding the median)

Also, what about finding the median of k sorted array of the same length? is there an efficient algorithm?

I tried to answer both the last question but didn't find a good solution, thanks!

Upvotes: 0

Views: 698

Answers (2)

Birchlabs
Birchlabs

Reputation: 8056

You can find the median of the union of two different-length sorted arrays m and n in O(log2(min(m+n))) time. Essentially you search for a split point in each array that both small splits contribute the same number of elements as both big splits. This identifies an equal number of elements above and below the median.

Searching for the ideal split point can be done with binary search (the sortedness ensures that you can approach efficiently by checking whether you overshot or undershot).
Finding split point in one array gives you the other array's split point for free (because you know how many elements you need to balance the elements you've selected from the first array).

Once you've found the split points in each array that give "all elements below median" and "all elements above median", you can calculate median by inspecting the boundary between those (i.e. grab the middle element if the union length is odd, otherwise average the elements directly on the boundary).

I translated into JavaScript a Python algorithm from the comments of this leetcode discussion (zzg_zzm's tweak atop stellari's algorithm). But I chose more intuitive variable names, and added comments.

Not exhaustively tested, but worked for the few inputs I tried.

function findUnionMedianSorted(smallArr, largeArr) {  
  // there are an equal number of elements below and above median
  // we need to find partitions on arr1 and arr2 such that arr1 and arr2
  // together contribute an equal number of submedian and supermedian elements

  // because fitness of partition point is transitive,
  // we can use binary search to approach optimal partition

  // we use the smaller array as a basis for finding the first partition,
  // since this eliminates situation where small array lacks enough elements to balance the partition

  // global median can then be calculated as:
  // avg(elementBelowMedian, elementAboveMedian)
  // so we must find also the elements that flank the median

  // ensure smallArr is the smaller array
  if (largeArr.length < smallArr.length) {
    return findUnionMedianSorted(largeArr, smallArr)
  }

  const unionArrLen = smallArr.length + largeArr.length

  // indices at which we would consider performing a cut
  let smallArrCutStartIx = 0, smallArrCutEndIx = smallArr.length
  while (smallArrCutStartIx <= smallArrCutEndIx) {
    // cut we are evaluating
    // midpoint of current search space of possible smallArr cuts
    const smallArrCutIx = Math.floor((smallArrCutStartIx + smallArrCutEndIx)/2)
    // partition on largeArr must provide same number of elements
    // above median as smallArr provides below median
    const largeArrCutIx = Math.floor(unionArrLen/2) - smallArrCutIx

    // smallArr and largeArr both submit a candidate for "what may be the element preceding the median"
    // this is the element preceding that array's cut
    // if there is no such element: we are cutting at an end of the array, so we have no element to offer
    // thus: we set extreme value such that comparisons favor the alternative (candidate from other array)
    const smallArrElementBeforeMedian = smallArrCutIx === 0
    ? Number.MIN_SAFE_INTEGER
    : smallArr[smallArrCutIx-1]
    const smallArrElementAfterMedian = smallArrCutIx === smallArr.length
    ? Number.MAX_SAFE_INTEGER
    : smallArr[smallArrCutIx]

    const largeArrElementBeforeMedian = largeArrCutIx === 0
    ? Number.MIN_SAFE_INTEGER
    : largeArr[largeArrCutIx-1]
    const largeArrElementAfterMedian = largeArrCutIx === largeArr.length
    ? Number.MAX_SAFE_INTEGER
    : largeArr[largeArrCutIx]

    // elements before median must be smaller than elements after median
    // this is already guaranteed within-array (elements are sorted)
    // but we check whether our proposed cut violates this across the two proposed arrays
    if (smallArrElementBeforeMedian > largeArrElementAfterMedian) {
      // our cut on smallArr is at too high an index
      // eliminate all cut locations equal to or greater than the cut index we tried
      smallArrCutEndIx = smallArrCutIx-1
      continue
    }
    if (smallArrElementAfterMedian < largeArrElementBeforeMedian) {
      // our cut on smallArr is at too low an index
      // eliminate all cut locations equal to or less than the cut index we tried
      smallArrCutStartIx = smallArrCutIx+1
      continue
    }

    // both candidates will be present in the union array,
    // but only the smaller one will be directly after the median
    const elementAfterMedian = Math.min(smallArrElementAfterMedian, largeArrElementAfterMedian)

    // does the union array have one middle or two?
    if (unionArrLen %2 === 1) {
      // odd length; one middle

      // why do we prefer `elementAfterMedian` and not `elementBeforeMedian`?
      // the material I adapted this from did not explain, so what follows is my (shaky) guess:

      // our "after" index points to the midpoint of a search space, so for odd-length arrays
      // it is actually an "equal to" index.
      return elementAfterMedian
    }

    // both candidates will be present in the union array,
    // but only the larger one will be directly before the median
    const elementBeforeMedian = Math.max(smallArrElementBeforeMedian, largeArrElementBeforeMedian)

    // average the two middles
    return (elementBeforeMedian + elementAfterMedian) / 2
  }
}

As for:

Also, what about finding the median of k sorted array of the same length? is there an efficient algorithm?

This is big enough to merit posting a separate question.

Upvotes: 0

mcdowella
mcdowella

Reputation: 19601

If you pick a value from one of the arrays and do a binary search for it in the other array then you will know how many values in each array are above and below the chosen value, which is enough to tell you how many values in the combination of the two are above and below the chosen value.

So you can do a binary chop on the first array and find out which of its values is nearest the overall median, and you can do a binary chop on the second array and find out which of its values is nearest the overall median, and one of these two arrays must contain the overall median.

At worst the cost of this is two outer binary chops, where each guess costs you an inner binary chop, so O(log^2(n)).

There are a couple of ideas which might give at least a practical speedup over this:

1) When doing an inner binary chop, you don't necessarily need to find an exact match. As soon as you have reduced the interval of values in which a match will lie enough to tell whether the value chosen is above or below the target median, you can return any value within that range.

2) You could look to see if the interval returned from the previous call of the inner binary chop was a feasible starting point for the current call. If it doesn't enclose the value searched for, perhaps an interval of the same size to one side or the other of it does.

Upvotes: 0

Related Questions