Reputation: 15803
I'd like to pass weights to scipy.stats.percentileofscore
. For example:
from scipy import stats
a = [1, 2, 3, 4]
val = 3
stats.percentileofscore(a, val)
Returns 75, as 75% of the values in a
lie at or below the val
3.
I'd like to add weights, for example:
weights = [2, 2, 3, 3]
weightedpercentileofscore(a, val, weights)
Should return 70, since (2 + 2 + 3) / (2 + 2 + 3 + 3) = 7 / 10 of the weights fall at or below 3.
This should also work for decimal weights and large weights, so just expanding the arrays isn't ideal.
Weighted percentile using numpy is relevant, but calculates percentiles (e.g. asking for the 10th percentile value) rather than the specific percentile for a value.
Upvotes: 1
Views: 554
Reputation: 8360
This should do the job.
import numpy as np
def weighted_percentile_of_score(a, weights, score, kind='weak'):
npa = np.array(a)
npw = np.array(weights)
if kind == 'rank': # Equivalent to 'weak' since we have weights.
kind = 'weak'
if kind in ['strict', 'mean']:
indx = npa < score
strict = 100 * sum(npw[indx]) / sum(weights)
if kind == 'strict':
return strict
if kind in ['weak', 'mean']:
indx = npa <= score
weak = 100 * sum(npw[indx]) / sum(weights)
if kind == 'weak':
return weak
if kind == 'mean':
return (strict + weak) / 2
a = [1, 2, 3, 4]
weights = [2, 2, 3, 3]
print(weighted_percentile_of_score(a, weights, 3)) # 70.0 as desired.
In practice, what you want to do is see the overall weight of the scores less or equal than your threshold score - divided by the whole sum of weights and in percent.
To get each value's corresponding weighted percentile as an array:
[weighted_percentile_of_score(a, weights, val) for val in a]
# [20.0, 40.0, 70.0, 100.0]
Upvotes: 3