neelshiv
neelshiv

Reputation: 6385

Is there a ranking metric based on percentages that favors larger magnitudes?

I have two groups, "in" and "out," and item categories that can be split up among the groups. For example, I can have item category A that is 99% "in" and 1% "out," and item B that is 98% "in" and 2% "out."

For each of these items, I actually have the counts that are in/out. For example, A could have 99 items in and 1 item out, and B could have 196 items that are in and 4 that are out.

I would like to rank these items based on the percentage that are "in," but I would also like to give some priority to items that have larger overall populations. This is because I would like to focus on items that are very relevant to the "in" group, but still have a large number of items in the "out" group that I could pursue.

Is there some kind of score that could do this?

Upvotes: 1

Views: 60

Answers (2)

neelshiv
neelshiv

Reputation: 6385

I ultimately ended up using bayesian averaging, which was recommended in this post. The technique is briefly described in this wikipedia article and more thoroughly described in this post by Evan miller and this post by Paul Masurel.

In bayesian averaging, "prior values" are used to influence the numerator and denominator towards the expected values. Essentially, the expected numerator and expected denominator are added to the actual numerator and denominator. In the case where the numerator and denominator are small, the prior values have a larger impact because they represent a larger proportion of the new numerator/denominator. As the numerators and denominators grow in magnitude, the bayesian average starts to approach the actual average due to increased confidence.

In my case, the prior value for the average was fairly low, which biased averages with small denominators downward.

Upvotes: 1

Ted Hopp
Ted Hopp

Reputation: 234847

I'd be tempted to use a probabilistic rank — the probability that an item category is from the group given the actual numbers for that category. This requires making some assumptions about the data set, including why a category may have any out-of-group items. You might take a look at the binomial test or the Mann-Whitney U test for a start. You might also look at some other kinds of nonparametric statistics.

Upvotes: 1

Related Questions