01zhou
01zhou

Reputation: 334

Median of Medians

I have read the order statistics to find the k-th smallest (or largest) element in an array of size n in linear time O(n).

There is one step that it needs to find the median of the medians.

  1. Split the array into [n/5] parts. Each part has 5 elements.
  2. Find the median in each part. (We have [n/5] numbers now)
  3. Repeat step 1 and 2 until we only have the last number. (i.e. recursive)

T(n) = T(n/5) + O(n) and we can get T(n) = O(n).

But, is it true that, the number we finally get is not the median of medians, but the median of medians of medians of medians of medians of medians, if we have a large array.

Please consider an array which has 125 elements.

First, it is split into 25 parts and we find 25 medians. Then, we split these 25 numbers into 5 parts and find 5 medians, Finally, we obtain the number which is median of medians of medians. (Not median of medians)

The reason why I care about it is that, I can understand there are at most about [3/4]*n elements that are smaller (or larger) than the median of medians. But what if it is not the median of medians but the median of medians of medians? In worse case there must be less elements that are smaller (or larger) than the pivot, which means the pivot is closer to the bound of the array.

If we have a VERY large array, and we found its median of medians of medians of medians of medians of medians. In the worst case the pivot we found can still be very close to the bound and what is the time complexity in this case?

I made up a dataset of 125 elements. Is the result 9?

0.8 0.9 1 inf inf
1.8 1.9 2 inf inf
6.8 6.9 7 inf inf
inf inf inf inf inf
inf inf inf inf inf

2.8 2.9 3 inf inf
3.8 3.9 4 inf inf
7.8 7.9 8 inf inf
inf inf inf inf inf
inf inf inf inf inf

4.8 4.9 5 inf inf
5.8 5.9 6 inf inf
8.8 8.9 9 inf inf
inf inf inf inf inf
inf inf inf inf inf

inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf

inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf

where inf means the number is large enough.

Upvotes: 9

Views: 3061

Answers (2)

OriShuss
OriShuss

Reputation: 269

The median of medians algorithm works in conjunction with the quick selection algorithm. Your confusion (and mine as well, before I realized the answer) stems from the fact that the 2nd step you stated is inaccurate: The recursion calls select and not medOfMeds, so the result found is accurate and not an estimate.

If we denote the median of medians function medOfMeds and the quick selection function select, then select calls medOfMeds which in turn calls select again.

Therefore, the median of medians in the example dataset you provided will not be 9. The median of medians is the actual median of the list of medians (and not estimate, like the medOfMeds function finds), which is found using select. In the example dataset, there are 16/25 infs, so the median of medians is the 12 or 13 element (out of 25), which is inf.

Upvotes: 0

gramonov
gramonov

Reputation: 527

Let's denote your median of medians of medians of ... as [median of]* = M.

First, I believe that median of medians algorithm (to select a good pivot) is not recursive. The algorithm goes as follows:

  1. Split the elements in the groups of 5
  2. Find the median of each group
  3. Find the median of medians and use it as a pivot.

Median of medians will be smaller than 3n/10 elements and larger than another 3n/10 elements, not 3n/4. You have n/5 numbers after selecting medians. Median of median is greater/smaller than half of those numbers, which is n/10. Each of those numbers is a median itself, so it's greater/smaller than 2 numbers, giving you another 2n/10 numbers. Now in total, you get n/10 + 2n/10 = 3n/10 numbers.

To address your second question, after collecting the group of 5's in your example dataset and calculating their medians, we will have the following sequence:

1, 2, 7, inf, inf
3, 4, 8, inf, inf
5, 6, 9, inf, inf, 
inf, inf, inf, inf, inf, 
inf, inf, inf, inf, inf.

So the median of medians would indeed be 9.

Your proposed [median of]* algorithm's runtime will be:

T(n) = O(n * log(n))

Now let's try to analyze how many numbers we have less/greater than M. We have the following groups:

  • depth 1: n/5 elements all medians
  • depth 2: n/25 elements all medians
  • ...
  • depth i: n/(5^i) elements all medians

Each group is less/greater than 2 elements of the previous depth, which is less/greater than 2 elements of the previous depth, and so on:

Calculating in total, we get that our M is greater/less than (n * (2^k) + k * n) /((2^k) * (5^k)). For depth = 1 you get median of medians, which is 3n/10.

Now assuming your depth is [log_5 (n)], i.e. n = 5^k, we get:

5^k * (k + 2^k)/(5^k * 2^k) which is -> 1.

Upvotes: 2

Related Questions