Need to calculate the percentage of distribution

Question

I have a set of numbers for a given set of attributes:

red    = 4
blue   = 0
orange = 2
purple = 1

I need to calculate the distribution percentage. Meaning, how diverse is the selection? Is it 20% diverse? Is it 100% diverse (meaning an even distribution of say 4,4,4,4)?

I'm trying to create a sexy percentage that approaches 100% the more the individual values average to the same value, and a lower value the more they get lopsided.

Has anyone done this?

Here is the PHP conversion of the below example. For some reason it's not producing 1.0 with a 4,4,4,4 example.

$arrayChoices = array(4,4,4,4);

foreach($arrayChoices as $p)
    $sum += $p;

print "sum: ".$sum."
";

$pArray = array();

foreach($arrayChoices as $rec)
{
    print "p vector value: ".$rec." ".$rec / $sum."

";
    array_push($pArray,$rec / $sum);
}   
$total = 0;

foreach($pArray as $p)
    if($p > 0)
        $total = $total - $p*log($p,2);

print "total = $total 
";

print round($total / log(count($pArray),2) *100);

Thanks in advance!

thus spake a.k. · Accepted Answer

A simple, if rather naive, scheme is to sum the absolute differences between your observations and a perfectly uniform distribution

red    = abs(4 - 7/4) = 9/4
blue   = abs(0 - 7/4) = 7/4
orange = abs(2 - 7/4) = 1/4
purple = abs(1 - 7/4) = 3/4

for a total of 5.
A perfectly even spread will have a score of zero which you must map to 100%.
Assuming you have n items in c categories, a perfectly uneven spread will have a score of

(c-1)*n/c + 1*(n-n/c) = 2*(n-n/c)

which you should map to 0%. For a score d, you might use the linear transformation

100% * (1 - d / (2*(n-n/c)))

For your example this would result in

100% * (1 - 5 / (2*(7-7/4))) = 100% * (1 - 10/21) ~ 52%

Better yet (although more complicated) is the Kolmogorov–Smirnov statistic with which you can make mathematically rigorous statements about the probability that a set of observations have some given underlying probability distribution.

Need to calculate the percentage of distribution

Answers (2)

Related Questions