Reputation: 1364
After careful research and thought, I decided to post this question which is a "sequel" to my previous question asked earlier today.
I made an algorithm that finds the median of an ArrayList and bascially all I do is make a temporary ArrayList, then using Collections.sort() on that ArrayList, I can easily get the median. The problem with that is it takes too long for larger files and I am trying(without luck) to find an implementation of an algorithm to get the median of an unsorted Array(or ArrayList).
From what I read here, the Median of Medians algorithm is used followed by QuickSelect but I cannot find an actual implementation that is easy enough to understand.
Here's a snippet of my code that finds the median of an ArrayList of size filterSize
:
while(elements.size()-counter >= filterSize){
for(int i = 0; i<filterSize; i++){
tempElements.add(this.elements.get(i+counter));
if(i==filterSize){
break;
}
}
Collections.sort(tempElements); //Sort tempElements to find median
outputElements.add(tempElements.get((filterSize-1)/2)); //Add median to an output ArrayList after calculating median index
counter++;
tempElements.clear(); //Removes all elements from the tempElements and start again
}
Basically I'm trying to avoid the whole use of Collections.sort()
and tempElements.clear()
in the code, thus the reason to find a better algorithm for finding the median in linear time.
Thanks.
Upvotes: 2
Views: 2683
Reputation: 4661
Just to add to another answer, "Quickselect" for finding the median has very tight running time guarantees if you randomly choose each pivot, namely "almost certain" linear time, meaning that the probability of getting a running time greater than cn goes to 0 very quickly as the constant c grows, as n becomes large. So unless you are worried about odds even less likely than winning the lottery, for all intents and purposes there is a constant c such that you will never see a running time greater than cn, regardless of n.
Upvotes: 1
Reputation: 43477
I think the basic Quickselect algorithm (code below from this link) is quite easy to understand: you pick a pivot, apply Quicksort's partition function and then see where that pivot ends up, recursing accordingly to only one of the halves.
function partition(list, left, right, pivotIndex)
pivotValue := list[pivotIndex]
swap list[pivotIndex] and list[right] // Move pivot to end
storeIndex := left
for i from left to right-1
if list[i] < pivotValue
swap list[storeIndex] and list[i]
increment storeIndex
swap list[right] and list[storeIndex] // Move pivot to its final place
return storeIndex
// Returns the n-th smallest element of list within left..right inclusive
// (i.e. left <= n <= right).
// The size of the list is not changing with each recursion.
// Thus, n does not need to be updated with each round.
function select(list, left, right, n)
if left = right // If the list contains only one element,
return list[left] // return that element
pivotIndex := ... // select a pivotIndex between left and right,
// e.g., left + floor(rand() * (right - left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
// The pivot is in its final sorted position
if n = pivotIndex
return list[n]
else if n < pivotIndex
return select(list, left, pivotIndex - 1, n)
else
return select(list, pivotIndex + 1, right, n)
Compared to median of medians, this can degenerate to O(n^2)
, but you can significantly reduce the chances of that happening by choosing the pivot randomly, like described in the comments.
If you're not happy with an implementation of median of medians that you don't fully understand, I suggest you go for something like this.
Upvotes: 4