Reputation: 81
I'm new in awk scripting and would like to have some help in calculating 95th percentile value for a file that consist of this data:
0.0001357
0.000112
0.000062
0.000054
0.000127
0.000114
0.000136
I tried:
cat filename.txt | sort -n |
awk 'BEGIN{c=0} {total[c]=$1; c++;} END{print total[int(NR*0.95-0.5)]}'
but I dont seem to get the correct value when I compare it to excel.
Upvotes: 8
Views: 19263
Reputation: 791
Just for the record, there is also solution, inspired by merlin2011 answer, that prints several desired percentiles:
# get amount of values
num="$(wc -l input.txt | cut -f1 -d' ')";
# sort values
sort -n input.txt > temp && mv temp input.txt
# print the desired percentiles
for p in 50 70 80 90 92 95 99 100; do
printf "%3s%%: %-5.5sms\n" "$p" "$(head input.txt -n "$((num / 100 * $p))" | tail -n1)";
done
Update: I messed it up. Bash math can't handle floating numbers, even not if used during a "single expression". That only works for files with 100*(N>0)
values. So either bc
or awk
is required to do the math.
In case you have an "odd" amount of values, you should replace "$((num / 100 * $p))"
with "$(awk "BEGIN {print int($num/100*$p)}")"
in the code above.
Finally awk
is part of that answer. ;)
Upvotes: 0
Reputation: 7959
Following the calculation suggested here, you can do this:
sort file -n | awk 'BEGIN{c=0} length($0){a[c]=$0;c++}END{p5=(c/100*5); p5=p5%1?int(p5)+1:p5; print a[c-p5-1]}'
Output for given input:
sort file -n | awk 'BEGIN{c=0} length($0){a[c]=$0;c++}END{p5=(c/100*5); p5=p5%1?int(p5)+1:p5; print a[c-p5-1]}'
0.0001357
Explanation:
PS. The statement p5=p5%1?int(p5)+1:p5
is doing a ceil
operation available in many languages.
Upvotes: 2
Reputation: 75585
I am not sure if Excel does some kind of weighted percentile, but if you actually want one of the numbers that was in your original set, then your method should work correctly for rounding.
You can simplify a little bit like this, but it's the same thing.
sort -n input.txt | awk '{all[NR] = $0} END{print all[int(NR*0.95 - 0.5)]}'
Upvotes: 13