IdeaHat
IdeaHat

Reputation: 7881

Efficient median calculation for small dataset in C++

I have many (hundreds of thousands, m) sets of doubles d, ~5-10 (n, constant small) long. These doubles are essentially randomly distributed. I need to get the median of each set: because m is very large, we need to calculate the median pretty quickly...these sets are pretty small though, so I think that is going to play a significant role in choosing how to do the median. I know I can use nth_element to get the median in O(n) with the selection algorithm, which I know I'm not going to beat in complexity. However, because of the small constant n, I am probably looking for the method that simply has the smallest overhead.

I have found a bunch of different ways to do the median (below) but am just curios if anyone knows the "correct" method to use here.

Min max heaps (O(n) build time, constant access, probably too much overhead)

This question from 2010 Which may be out of date (new STL/Boost code may already implement this stuff), also focuses more on time complexity than overhead.

Upvotes: 1

Views: 1199

Answers (2)

ogni42
ogni42

Reputation: 1276

if it is std::set (you didnt answer to BoBTFish) then it is already sorted. Hence, you will get the median by iterating to n/2 which is always better or equal O(n), typically is should be O(ld n). n-th element will not help here.

Upvotes: 0

Roger Rowland
Roger Rowland

Reputation: 26259

This may not scale well to your data sizes, but it's a code snippet I found (can't remember where) and use in my image processing functions to get the median of 9 unsigned char pixels.

// optimised median search on 9 values
#define PIX_SWAP(a, b) { unsigned char uTemp = (a); (a) = (b); (b) = uTemp; }
#define PIX_SORT(a, b) { if ((a) > (b)) PIX_SWAP((a), (b)); }

unsigned char GetMedian9(unsigned char *pNine)
{
    // nb - this is theoretically the fastest way to get the median of 9 values
    PIX_SORT(pNine[1], pNine[2]); PIX_SORT(pNine[4], pNine[5]); PIX_SORT(pNine[7], pNine[8]); 
    PIX_SORT(pNine[0], pNine[1]); PIX_SORT(pNine[3], pNine[4]); PIX_SORT(pNine[6], pNine[7]); 
    PIX_SORT(pNine[1], pNine[2]); PIX_SORT(pNine[4], pNine[5]); PIX_SORT(pNine[7], pNine[8]); 
    PIX_SORT(pNine[0], pNine[3]); PIX_SORT(pNine[5], pNine[8]); PIX_SORT(pNine[4], pNine[7]); 
    PIX_SORT(pNine[3], pNine[6]); PIX_SORT(pNine[1], pNine[4]); PIX_SORT(pNine[2], pNine[5]); 
    PIX_SORT(pNine[4], pNine[7]); PIX_SORT(pNine[4], pNine[2]); PIX_SORT(pNine[6], pNine[4]); 
    PIX_SORT(pNine[4], pNine[2]); return(pNine[4]);
}

#undef PIX_SWAP
#undef PIX_SORT

EDIT - Ok, it's also referenced in this answer too

Upvotes: 1

Related Questions