Reputation: 31

Median Algorithm in O(log n)

How can we remove the median of a set with time complexity O(log n)? Some idea?

Upvotes: 3

Answers (9)

supercat

Reputation: 81347

If the set is sorted, finding the median requires O(1) item retrievals. If the items are in arbitrary sequence, it will not be possible to identify the median with certainty without examining the majority of the items. If one has examined most, but not all, of the items, that will allow one to guarantee that the median will be within some range [if the list contains duplicates, the upper and lower bounds may match], but examining the majority of the items in a list implies O(n) item retrievals.

If one has the information in a collection which is not fully ordered, but where certain ordering relationships are known, then the time required may require anywhere between O(1) and O(n) item retrievals, depending upon the nature of the known ordering relation.

Upvotes: 18

ludo

Reputation: 1466

As mentioned in previous answers, there is no way to find the median without touching every element of the data structure. If the algorithm you look for must be executed sequentially, then the best you can do is O(n). The deterministic selection algorithm (median-of-medians) or BFPRT algorithm will solve the problem with a worst case of O(n). You can find more about that here: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

However, the median of medians algorithm can be made to run faster than O(n) making it parallel. Due to it's divide and conquer nature, the algorithm can be "easily" made parallel. For instance, when dividing the input array in elements of 5, you could potentially launch a thread for each sub-array, sort it and find the median within that thread. When this step finished the threads are joined and the algorithm is run again with the newly formed array of medians.

Note that such design would only be beneficial in really large data sets. The additional overhead that spawning threads has and merging them makes it unfeasible for smaller sets. This has a bit of insight: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html

Note that you can find asymptotically faster algorithms out there, however they are not practical enough for daily use. Your best bet is the already mentioned sequential median-of-medians algorithm.

Upvotes: 4

Nicolas

Reputation: 960

Try a Red-black-tree. It should work quiet good and with a binary search you get ur log(n). It has aswell a remove and insert time of log(n) and rebalancing is done in log(n) aswell.

Upvotes: 4

RawMean

Reputation: 8755

To expand on rwong's answer: Here is an example code

// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;


int main () {
  int myints[] = {9,8,7,6,5,4,3,2,1};
  vector<int> myvector (myints, myints+9);
  vector<int>::iterator it;

  partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());

  // print out content:
  cout << "myvector contains:";
  for (it=myvector.begin(); it!=myvector.end(); ++it)
    cout << " " << *it;

  cout << endl;

  return 0;
}

Output: myvector contains: 1 2 3 4 5 9 8 7 6

The element in the middle would be the median.

Upvotes: 0

Master Yoda

Reputation: 587

I know one randomize algorithm with time complexity of O(n) in expectation.

Here is the algorithm:

Input: array of n numbers A[1...n] [without loss of generality we can assume n is even]

Output: n/2th element in the sorted array.

Algorithm ( A[1..n] , k = n/2):

Pick a pivot - p universally at random from 1...n

Divided array into 2 parts:

L - having element <= A[p]

R - having element > A[p]

if(n/2 == |L|) A[|L| + 1] is the median stop

if( n/2 < |L|) re-curse on (L, k)

else re-curse on (R, k - (|L| + 1)

Complexity: O( n) proof is all mathematical. One page long. If you are interested ping me.

Upvotes: 2

user412090

Reputation: 356

Master Yoda's randomized algorithm has, of course, a minimum complexity of n like any other, an expected complexity of n (not log n) and a maximum complexity of n squared like Quicksort. It's still very good.

In practice, the "random" pivot choice might sometimes be a fixed location (without involving a RNG) because the initial array elements are known to be random enough (e.g. a random permutation of distinct values, or independent and identically distributed) or deduced from an approximate or exactly known distribution of input values.

Upvotes: 2

Sheldon L. Cooper

Reputation: 3260

Here's a solution in Java, based on TreeSet:

public class SetWithMedian {
    private SortedSet<Integer> s = new TreeSet<Integer>();
    private Integer m = null;

    public boolean contains(int e) {
        return s.contains(e);
    }
    public Integer getMedian() {
        return m;
    }
    public void add(int e) {
        s.add(e);
        updateMedian();
    }
    public void remove(int e) {
        s.remove(e);
        updateMedian();
    }
    private void updateMedian() {
        if (s.size() == 0) {
            m = null;
        } else if (s.size() == 1) {
            m = s.first();
        } else {
            SortedSet<Integer> h = s.headSet(m);
            SortedSet<Integer> t = s.tailSet(m + 1);
            int x = 1 - s.size() % 2;
            if (h.size() < t.size() + x)
                m = t.first();
            else if (h.size() > t.size() + x)
                m = h.last();
        }
    }
}

Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.

Edit: To help understand the code, here's the invariant condition of the class attributes:

private boolean isGood() {
    if (s.isEmpty()) {
        return m == null;
    } else {
        return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
    }
}

In human-readable form:

If the set "s" is empty, then "m" must be null.
If the set "s" is not empty, then it must contain "m".
Let x be the number of elements strictly less than "m", and let y be the number of elements greater than or equal "m". Then, if the total number of elements is even, x must be equal to y; otherwise, x+1 must be equal to y.

Upvotes: 4

Tyler McHenry

Reputation: 76770

For a general, unsorted set, it is impossible to reliably find the median in better than O(n) time. You can find the median of a sorted set in O(1), or you can trivially sort the set yourself in O(n log n) time and then find the median in O(1), giving an O(n logn n) algorithm. Or, finally, there are more clever median selection algorithms that can work by partitioning instead of sorting and yield O(n) performance.

But if the set has no special properties and you are not allowed any pre-processing step, you will never get below O(n) by the simple fact that you will need to examine all of the elements at least once to ensure that your median is correct.

Upvotes: 5

rwong

Reputation: 6162

For unsorted lists, repeatedly do O(n) partial sort until the element located at the median position is known. This is at least O(n), though.

Is there any information about the elements being sorted?

Upvotes: 5

Median Algorithm in O(log n)

Answers (9)

Related Questions