Reputation:
Given a long array of latencies which are in milliseconds, I want to calculate percentile from them. I got below method which does the work but I am not sure how I can verify whether this gives me accurate result?
public static long[] percentiles(long[] latencies, double... percentiles) {
Arrays.sort(latencies, 0, latencies.length);
long[] values = new long[percentiles.length];
for (int i = 0; i < percentiles.length; i++) {
int index = (int) (percentiles[i] * latencies.length);
values[i] = latencies[index];
}
return values;
}
I would like to get 50th, 95th, 99th and 99.9th percentile from latencies
array.
long[] percs = percentiles(latencies, 0.5, 0.95, 0.99, 0.999);
Is this the right way to get percentile given a long array of latencies? I am working with Java 7.
Upvotes: 17
Views: 45913
Reputation: 1948
If the array is sorted, you should just return the relative element in your array (e.g., p99 in a 1000-elements array is the 990th element).
If the array is unsorted, and in order to calculate percentiles more efficiently, you should probably use something like Quickselect.
Upvotes: 0
Reputation: 523
This is what you are looking for:
public static void main(String[] args) {
List<Long> latencies = new List<Long>() { 3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20 };
Collections.sort(latencies);
System.out.println(percentile(latencies, 25));
System.out.println(percentile(latencies, 50));
System.out.println(percentile(latencies, 75));
System.out.println(percentile(latencies, 100));
}
public static long percentile(List<Long> latencies, double percentile) {
int index = (int) Math.ceil(percentile / 100.0 * latencies.size());
return latencies.get(index-1);
}
Upvotes: 26
Reputation: 13620
public static double percentile(double percentile, List<Double> items) {
Preconditions.checkArgument(percentile >= 0);
Preconditions.checkArgument(percentile <= 100);
Preconditions.checkArgument(!items.isEmpty());
Collections.sort(items);
return items.get((int) Math.round(percentile / 100.0 * (items.size() - 1)));
}
@Test
public void test1() {
List<Double> list = Arrays.asList(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0);
assertThat(percentile(0, list)).isEqualTo(0.0);
assertThat(percentile(20, list)).isEqualTo(2.0);
assertThat(percentile(80, list)).isEqualTo(8.0);
assertThat(percentile(100, list)).isEqualTo(10.0);
}
@Test
public void test2() {
List<Double> list = Arrays.asList(0.0, 1.0, 2.0, 3.0);
assertThat(percentile(51, list)).isEqualTo(2.0);
assertThat(percentile(49, list)).isEqualTo(1.0);
}
@Test
public void test3() {
List<Double> list = Arrays.asList(42.0);
assertThat(percentile(0, list)).isEqualTo(42.0);
assertThat(percentile(100, list)).isEqualTo(42.0);
}
Upvotes: 4
Reputation: 31699
According to Wikipedia, there is no standard definition of percentile; however, they give a few possible definitions. The code you've posted appears to be closest to the Nearest Rank Method, but it's not quite the same.
The formula they give is
n = ceiling((P / 100) x N)
where N
is the length of the list, P
is the percentile, and n
will be the ordinal rank. You've already done the division by 100. Looking at the examples they give, it's clear that the "ordinal rank" is the index in the list, but it's 1-relative. Thus, to get an index into a Java array, you'd have to subtract 1. Therefore, the correct formula should be
n = ceiling(percentile * N) - 1
Using the variables in your code, the Java equivalent would be
(int) Math.ceil(percentiles[i] * latencies.length) - 1
This is not quite the code you've written. When you cast a double
to an int
, the result is rounded toward 0, i.e. it's the equivalent of the "floor" function. So your code computes
floor(percentiles[i] * latencies.length)
If percentiles[i] * latencies.length
is not an integer, the result is the same either way. However, if it is an integer, so that "floor" and "ceiling" are the same value, then the result will be different.
An example from Wikipedia is to compute the 40th percentile when the list is {15, 20, 35, 40, 50}. Their answer is to find the second item in the list, i.e. 20, because 0.40 * 5 = 2.0, and ceiling(2.0) = 2.0.
However, your code:
int index = (int) (percentiles[i] * latencies.length);
will result in index
being 2, which isn't what you want, because that will give you the third item in the list, instead of the second.
So in order to match the Wikipedia definition, your computation of the index will need to be modified a little. (On the other hand, I wouldn't be surprised if someone comes along and says your computation is correct and Wikipedia is wrong. We'll see...)
Upvotes: 2