Nik
Nik

Reputation: 1221

weighted median and list comprehension

I would like to compute the weighted median of a list of unique values and a list of weights. Weights tell how often each value is present in the list.

Example:

real_data = [1,1,2,3,3,4,4,4]
values = [1,2,3,4]
weights = [2,1,2,3]

One way to do it should be:

np.median(np.repeat(values, weights))

However I feel like this is a bit inefficient since it first generated the whole list, which could be a problem if the weights are very high. Is there a more efficient way of doing this?

Also, out of curiosity, can you think of a way to write np.repeat as a list comprehension?

Upvotes: 0

Views: 138

Answers (1)

joostblack
joostblack

Reputation: 2525

The solution I came up with:

def median_3(weights, values):
    s=0
    n=sum(weights)
    for i,w in enumerate(weights):
        s+=w
        if s>n/2:
            if n%2 == 0:
                if s-w==n/2:
                    return (values[i]+values[i-1])/2
                else:
                    return values[i]
            else:
                return values[i]

The code for time comparison:

import timeit


def median_1(weights, values):
    return np.median(np.repeat(values, weights))

def median_3(weights, values):
    s=0
    n=sum(weights)
    for i,w in enumerate(weights):
        s+=w
        if s>n/2:
            if n%2 == 0:
                if s-w==n/2:
                    return (values[i]+values[i-1])/2
                else:
                    return values[i]
            else:
                return values[i]
    
t1 = timeit.Timer(lambda: median_1(weights, values))
t3 = timeit.Timer(lambda: median_3(weights, values))

print(f"function median_1 for 1000 cycles: {t1.timeit(1000)} s")
print(f"function median 3 for 1000 cycles: {t3.timeit(1000)} s")


print(f" result from median_1 {median_1(weights, values)}")
print(f" result from median_3 {median_3(weights, values)}")

result:

function median_1 for 1000 cycles: 0.051409600000000055 s
function median 3 for 1000 cycles: 0.0013161999999999896 s
 result from median_1 3.0
 result from median_3 3

Hope this helps. It should also work for even number of elements.

Upvotes: 1

Related Questions