Reputation: 195
I'm trying to reverse engineer a calculation that was done on an old program but can't quite get it. I need a count of how many values are in the top 27%, middle 46%, and bottom 27%.
I have the following data sets with eleven values in each and what the program yields in regards to the percentages and the number of values that fall into those percentiles.
Upper 27%: 4, Middle 46%: 4, Lower 27%: 3
values: 8,9,10,11,11,11,11,12,12,12,13
Upper 27%: 5, Middle 46%: 4, Lower 27%: 2
values: 2,3,4,4,4,4,5,5,5,5,5
Upper 27%: 2, Middle 46%: 8, Lower 27%: 1
values: 2,4,4,4,4,4,4,4,4,5,5
Upper 27%: 2, Middle 46%: 6, Lower 27%: 3
values: 13,17,17,18,19,19,19,21,21,23,24
I have found formulas such as (n * p) where n is the number of values and p is the percentile but it doesn't seem to work across all of these data sets to give the same results. I'm a little lost and haven't found anything that produces the results here.
I have tested code that I have found on the internet but none have worked on the different data sets.
Code sample that I tried:
internal static double percentile(double[] sortedData, double p)
{
if (p >= 100.0d) return sortedData[sortedData.Length - 1];
double position = (double)(sortedData.Length + 1) * p / 100.0;
double leftNumber = 0.0d, rightNumber = 0.0d;
double n = p / 100.0d * (sortedData.Length - 1) + 1.0d;
if (position >= 1)
{
leftNumber = sortedData[(int)System.Math.Floor(n) - 1];
rightNumber = sortedData[(int)System.Math.Floor(n)];
}
else
{
leftNumber = sortedData[0]; // first data
rightNumber = sortedData[1]; // first data
}
if (leftNumber == rightNumber)
return leftNumber;
else
{
double part = n - System.Math.Floor(n);
return leftNumber + part * (rightNumber - leftNumber);
}
}
Is there a formula or a name for what I am trying to do? Am I on the right track with percentile ranks?
Upvotes: 1
Views: 4548
Reputation: 1802
You're on the right path. This is indeed the percentile rank formula. My initial thought was to get the value at the 27th percentile (which looking at your code it looks like you started down this path as well) and figure out how many values were greater than or less than; but the values you provided don't support these numbers very well.
Therefore the approach I took was to calculate the percentile rank of each number and put it into a count if they matched the percentile you have above. This seems to be the approach they took.
Formula (check out this website for more info):
PR% = L + ( 0.5 x S ) / N
Where,
L = Number of below rank,
S = Number of same rank,
N = Total numbers.
Code:
var lower = 0;
var middle = 0;
var upper = 0;
// var values = new int[] { 8, 9, 10, 11, 11, 11, 11, 12, 12, 12, 13 };
// var values = new int[] { 2, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5 };
// var values = new int[] { 2, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5 };
var values = new int[] { 13, 17, 17, 18, 19, 19, 19, 21, 21, 23, 24 };
var n = values.Length;
foreach(var i in values)
{
var pr = ((values.Count(v => v < i) + (.5 * values.Count(v => v == i))) / n);
if (pr < .27)
lower += 1;
else if (pr > .73)
upper += 1;
else
middle += 1;
}
Console.WriteLine("Upper: " + upper);
Console.WriteLine("Middle: " + middle);
Console.WriteLine("Lower: " + lower);
Upvotes: 5