Reputation: 334
I have read the order statistics to find the k-th smallest (or largest) element in an array of size n in linear time O(n).
There is one step that it needs to find the median of the medians.
T(n) = T(n/5) + O(n) and we can get T(n) = O(n).
But, is it true that, the number we finally get is not the median of medians, but the median of medians of medians of medians of medians of medians, if we have a large array.
Please consider an array which has 125 elements.
First, it is split into 25 parts and we find 25 medians. Then, we split these 25 numbers into 5 parts and find 5 medians, Finally, we obtain the number which is median of medians of medians. (Not median of medians)
The reason why I care about it is that, I can understand there are at most about [3/4]*n elements that are smaller (or larger) than the median of medians. But what if it is not the median of medians but the median of medians of medians? In worse case there must be less elements that are smaller (or larger) than the pivot, which means the pivot is closer to the bound of the array.
If we have a VERY large array, and we found its median of medians of medians of medians of medians of medians. In the worst case the pivot we found can still be very close to the bound and what is the time complexity in this case?
I made up a dataset of 125 elements. Is the result 9?
0.8 0.9 1 inf inf
1.8 1.9 2 inf inf
6.8 6.9 7 inf inf
inf inf inf inf inf
inf inf inf inf inf
2.8 2.9 3 inf inf
3.8 3.9 4 inf inf
7.8 7.9 8 inf inf
inf inf inf inf inf
inf inf inf inf inf
4.8 4.9 5 inf inf
5.8 5.9 6 inf inf
8.8 8.9 9 inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
where inf means the number is large enough.
Upvotes: 9
Views: 3061
Reputation: 269
The median of medians algorithm works in conjunction with the quick selection algorithm.
Your confusion (and mine as well, before I realized the answer) stems from the fact that the 2nd step you stated is inaccurate: The recursion calls select
and not medOfMeds
, so the result found is accurate and not an estimate.
If we denote the median of medians function medOfMeds
and the quick selection function select
, then select
calls medOfMeds
which in turn calls select
again.
Therefore, the median of medians in the example dataset you provided will not be 9. The median of medians is the actual median of the list of medians (and not estimate, like the medOfMeds
function finds), which is found using select
. In the example dataset, there are 16/25 inf
s, so the median of medians is the 12 or 13 element (out of 25), which is inf
.
Upvotes: 0
Reputation: 527
Let's denote your median of medians of medians of ... as [median of]* = M.
First, I believe that median of medians algorithm (to select a good pivot) is not recursive. The algorithm goes as follows:
Median of medians will be smaller than 3n/10 elements and larger than another 3n/10 elements, not 3n/4. You have n/5 numbers after selecting medians. Median of median is greater/smaller than half of those numbers, which is n/10. Each of those numbers is a median itself, so it's greater/smaller than 2 numbers, giving you another 2n/10 numbers. Now in total, you get n/10 + 2n/10 = 3n/10 numbers.
To address your second question, after collecting the group of 5's in your example dataset and calculating their medians, we will have the following sequence:
1, 2, 7, inf, inf
3, 4, 8, inf, inf
5, 6, 9, inf, inf,
inf, inf, inf, inf, inf,
inf, inf, inf, inf, inf.
So the median of medians would indeed be 9.
Your proposed [median of]* algorithm's runtime will be:
T(n) = O(n * log(n))
Now let's try to analyze how many numbers we have less/greater than M. We have the following groups:
Each group is less/greater than 2 elements of the previous depth, which is less/greater than 2 elements of the previous depth, and so on:
Calculating in total, we get that our M is greater/less than (n * (2^k) + k * n) /((2^k) * (5^k)). For depth = 1 you get median of medians, which is 3n/10.
Now assuming your depth is [log_5 (n)], i.e. n = 5^k, we get:
5^k * (k + 2^k)/(5^k * 2^k) which is -> 1.
Upvotes: 2