Pyderman
Pyderman

Reputation: 16189

Getting the two numbers on either side of the median

Given:

x = [1.23, 2.0, 3.45, 4.1]

then:

middle = numpy.median(x)

Were the size of list x odd, I could use x[x.index(middle)-1] and x[x.index(middle)+1] to get the two numbers either side of the middle. This won't work in the above case, since the median is not present in x. Is there is standard approach that can handle both even-sized and odd-sized lists?

Upvotes: 3

Views: 1899

Answers (6)

LetzerWille
LetzerWille

Reputation: 5658

You can use the bisect module:

x = [1.23, 2.0, 3.45, 4.1]  

 def select_near_median(list_):
    # gets two numbers either side of median
    # in a sorted list
    # bisect finds the index where median number
    # can be inserted
    import bisect
    import statistics
    list_ = sorted(list_)
    med = statistics.median(list_)
    ind = bisect.bisect_left(list_, med)
    if med == list_[ind]:
        left = ind
        right = left + 1
    else:
        left = ind - 1
        right = ind
    return list_[left], list_[right]


        print(select_near_median([1,2,3,4]))
        (2, 3)
        print(select_near_median([1,2,3,4,5]))
        (3, 4)
        print(select_near_med(x))
        (2.0, 3.45)

Upvotes: 0

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

To get the median you have to have a sorted list so this is a simple math problem, if the list length is uneven you want the halfway point - 1 and the halfway point + 1, if the list length is even the median is the average of the two middle numbers so you want those two middle numbers.

def get_two(l):
    ln = len(l)
    half = ln // 2
    return x[half-1], x[half + ln % 2]

Upvotes: 2

ivan_pozdeev
ivan_pozdeev

Reputation: 35998

By definition, a median is a value that divides your samples into 2 halves.

The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e.g., the median of {3, 3, 5, 9, 11} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values (the median of {3, 5, 7, 9} is (5 + 7) / 2 = 6).

So, you need to

  • somehow determine which of your samples are the "lower half" and which are the "upper half", then
  • select the max & min ones from the subsets, accordingly.

Possible approaches include:

  • sort the entire list (probably, in-place for efficiency), then select the corresponding elements. (O(N*log(N)))
  • traverse the list, sort the elements into "lower" and "upper" parts as you go (effectively, you'll need to calculate the median at each step to classify the next element) and keep track of your "boundary" values (you'll need them to calculate the median anyway) (O(N))

So, basically, what you need is (altered code from the linked source for your 1-D case):

if sz % 2 == 0:
    part = partition(a, ((sz // 2) - 1, sz // 2))
else:
    part = partition(a, (sz - 1) // 2)

then retrieve the corresponding elements.

Note, however, if you strive for efficiency, that there's quite an overhead converting data into ndarray.

Upvotes: 1

Régis B.
Régis B.

Reputation: 10588

These are the numbers you are looking for:

x[(len(x)-1)/2 - len(x)%2], x[len(x)/2 + len(x)%2]

Upvotes: 3

sgrg
sgrg

Reputation: 1230

If the input list is unsorted (say x = [1.23, 3.45, 4.1, 2.0]) then you want to iterate through and find the two quantities of interest (this would also work for sorted inputs of course). Something like this:

largestSmallerThanMedian = x[0]
smallestLargerThanMedian = x[len(x)-1] 
for n in x:
    if (n < middle) and (n >= largestSmallerThanMedian):
        largestSmallerThanMedian = n
    if (n > middle) and (n <= smallestLargerThanMedian):
        smallestLargerThanMedian = n

And then largestSmallerThanMedian and smallestLargerThanMedian would have the two quantities of interest.

Upvotes: 1

ewcz
ewcz

Reputation: 13087

Assuming that the length of the sorted input list is N then it seems to me that you want access elements N/2-1 and (N+1)/2 (assuming integer division), i.e.,

[1.23, 2.0, 3.45, 4.1] => N = 4 thus N/2-1 = 1 and (N+1)/2 = 2

[1.23, 2.0, 3.45, 4.1, 5.6] => N = 5 thus N/2-1 = 1 and (N+1)/2 = 3

Upvotes: 3

Related Questions