Venkatesh Subramaniam
Venkatesh Subramaniam

Reputation: 63

Calculate percentile column from a numeric column

I am trying to create a calculated column called percentile_Idle_Time(I am trying to calculate percentile for every value). The column is percentile value from idle_time% column.

So, the input data is

Total Time   Idle Time    Idle Time %      
 5:10:00      0:14:00       4.6%
 3:09:00      0:20:00       9.49%
   .             .            .
   .             .            .

So, I am trying to create a new column called percentile_Idle_Time which is nothing but the percentile position of the Idle Time % value

So, the output data should be like

Total Time   Idle Time    Idle Time %    percentile_Idle_Time  
 5:10:00      0:14:00       4.6%             75.4
 3:09:00      0:20:00       9.49%            97.9
   .             .            .               .
   .             .            .               .

Note: The numbers are pretty rough(not accurate) I tried using

 proc univariate data=WORK.QUERY_FOR_PEOPLENET_DATA_00_0000 noprint;
 by DriverId;
 var 'Short Idle Time %'n;
 output pctlpre=P_ ;
 run;

But its not working. The other challenge is to take percentile score from the % column

Upvotes: 0

Views: 1019

Answers (1)

Reeza
Reeza

Reputation: 21264

Do it manually then. Sort the data ascending and use the NOBS to get the number of observations. Use n to divide by NOBS to get the total value.

proc sort data=sashelp.class out=class;
by weight;
run;


data want;
  set class Nobs=myobs;

   percentile = _n_ / myobs;
run;

NOTE that this does not deal with ties. If you do have ties that need to be dealt with, use PROC RANK instead. I usually do it with a group of 100 and then you'll get the 1 to 100 groups. But its' 96.5 percentile if that's what you're looking for.

 proc rank data=sashelp.class out=ranked_class groups=100;
 var weight;
 ranks  weight_percentile;
 run;

EDIT: Fixed references in data step and sort to align.

Upvotes: 1

Related Questions