Max Ghenis
Max Ghenis

Reputation: 15803

Weighted version of scipy percentileofscore

I'd like to pass weights to scipy.stats.percentileofscore. For example:

from scipy import stats
a = [1, 2, 3, 4]
val = 3
stats.percentileofscore(a, val)

Returns 75, as 75% of the values in a lie at or below the val 3.

I'd like to add weights, for example:

weights = [2, 2, 3, 3]
weightedpercentileofscore(a, val, weights)

Should return 70, since (2 + 2 + 3) / (2 + 2 + 3 + 3) = 7 / 10 of the weights fall at or below 3.

This should also work for decimal weights and large weights, so just expanding the arrays isn't ideal.

Weighted percentile using numpy is relevant, but calculates percentiles (e.g. asking for the 10th percentile value) rather than the specific percentile for a value.

Upvotes: 1

Views: 554

Answers (1)

Fabio Veronese
Fabio Veronese

Reputation: 8360

This should do the job.

import numpy as np

def weighted_percentile_of_score(a, weights, score, kind='weak'):
    npa = np.array(a)
    npw = np.array(weights)

    if kind == 'rank':  # Equivalent to 'weak' since we have weights.
        kind = 'weak'

    if kind in ['strict', 'mean']:
        indx = npa < score
        strict = 100 * sum(npw[indx]) / sum(weights)
    if kind == 'strict':
        return strict

    if kind in ['weak', 'mean']:    
        indx = npa <= score
        weak = 100 * sum(npw[indx]) / sum(weights)
    if kind == 'weak':
        return weak

    if kind == 'mean':
        return (strict + weak) / 2


a = [1, 2, 3, 4]
weights = [2, 2, 3, 3]
print(weighted_percentile_of_score(a, weights, 3))  # 70.0 as desired.

In practice, what you want to do is see the overall weight of the scores less or equal than your threshold score - divided by the whole sum of weights and in percent.

To get each value's corresponding weighted percentile as an array:

[weighted_percentile_of_score(a, weights, val) for val in a]
# [20.0, 40.0, 70.0, 100.0]

Upvotes: 3

Related Questions