Reputation: 2315
I have an array lets say a = { 1,4,5,6,2,23,4,2};
now I have to find median of array position from 2 to 6 (odd total terms), so what I have done, I have taken a[1]
to a[5]
in arr[0]
to arr[4]
then I have sorted it and write the arr[2]
as the median .
But here every time I put values from one array to another, so that the values of my initial array remains the same. Secondly, I have sorted, so this procedure is taking pretty much **time**
.
So I want to know if there is any way I can do this differently to reduce my computation time
.
Any websites, material to understand, what, and how to do?
Upvotes: 10
Views: 14176
Reputation: 4153
All existing answers have some downsides in certain situations:
std::nth_element
is more efficient but it still mutates the subrange, so one still needs an additional array.For this reason, I am posting my approach which uses std::map
and is inspired by selection sort algorithm:
std::map<int, int>
.With this object, we can efficently find the median of the subrange whose length is subrangeLength
:
double median(const std::map<int, int> &histogram, int subrangeLength)
{
const int middle{subrangeLength / 2};
int count{0};
/* We use the fact that keys in std::map are sorted, so by simply iterating
and adding up the frequencies, we can find the median. */
if (subrangeLength % 2 == 1) {
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is odd, "middle" is the lower integer bound of
subrangeLength / 2, so as soon as we cross it, we have found the median. */
if (count > middle) {
return freq.first;
}
}
} else {
std::optional<double> medLeft;
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is even, we need to pay attention to the case when
elements at positions middle and middle + 1 are different. */
if (count == middle) {
medLeft = freq.first;
} else if (count > middle) {
if (!medLeft) {
medLeft = freq.first;
}
return (*medLeft + freq.first) / 2.0;
}
}
}
return -1;
}
Now when we want to get the median of next subrange, we simply update the histogram by decreasing the frequency of the element that is to be removed and add/increase it for the new element (with std::map
, this is done in constant time). Now we compute the median again and continue with this until we handle all subranges.
Upvotes: 0
Reputation: 4216
If you are doing multiple queries on the same array then you could use a Segment Tree. They are generally used to do range minimum/maximum and range sum queries but you can change it to do range median.
A segment tree for a set with n intervals uses O(n log n) storage and can be built in O(n log n) time. A range query can be done in O(log n).
Example of median in range segment tree:
You build the segment tree from the bottom up (update from the top down):
[5]
[3] [7]
[1,2] [4] [6] [8]
1 2 3 4 5 6 7 8
Indices covered by node:
[4]
[2] [6]
[0,1] [3] [5] [7]
0 1 2 3 4 5 6 7
A query for median for range indices of 4-6 would go down this path of values:
[4]
[5]
0 1 2 3 4 5 6 7
Doing a search for the median, you know the number of total elements in the query (3) and the median in that range would be the 2nd element (index 5). So you are essentially doing a search for the first node which contains that index which is node with values [1,2] (indices 0,1).
Doing a search of the median of the range 3-6 is a bit more complicated because you have to search for two indices (4,5) which happen to lie in the same node.
[4]
[6]
[5]
0 1 2 3 4 5 6 7
Range minimum query on Segment Tree
Upvotes: 6
Reputation: 1100
Use std::nth_element
from <algorithm>
which is O(N):
nth_element(a, a + size / 2, a + size);
median = a[size/2];
Upvotes: 22
Reputation: 280
I think the best way is to use the median of medians algorithm of counting the k-th largest element of an array. You can find the overall idea of the algorithm here: Median of Medians in Java , on wikipedia: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm or just browse the internet. Some general improvements can be made during implementation (avoid sorting when choosing the median of particular arrays). However, note that for an array of less than 50 elements its more efficient to use insertion sort than median of medians algorithm.
Upvotes: 0
Reputation: 145919
To find the median of an array of less than 9 elements, I think the most efficient is to use a sort algorithm like insertion sort. The complexity is bad, but for such a small array because of the k
in the complexity of better algorithms like quicksort, insertion sort is very efficient. Do your own benchmark but I can tell you will have better results with insertion sort than with shell sort or quicksort.
Upvotes: 1
Reputation: 106327
It is possible to find the median without sorting in O(n) time; algorithms that do this are called selection algorithms.
Upvotes: 15