Frederi ROSE
Frederi ROSE

Reputation: 351

How to filter a dataset according to its quantile

In the following code, how could I keep only the observations superior to the 95th quantile?

data test;
input business_ID $ count;
datalines;
'busi1' 2
'busi1' 10
'busi1' 4
'busi2' 1
'busi3' 2
'busi3' 1
;
run;

proc sort data = test;
    by descending count;
run;

I don't know how to cleanly stock the quartile and then re-use it with an if condition.

Thanks

Edit : I can determine the quantile with this code :

proc means data=test noprint; 
var count;
output out=quantile P75= / autoname;
run;

But how can I relate to it in the Test dataset so that I can select every observations above that quantile?

Upvotes: 1

Views: 829

Answers (2)

pinegulf
pinegulf

Reputation: 1396

User2877959's solution is solid. Recently I did this with Proc Rank. The solution is a bit 'work around-y', but saves a lot of typing.

proc rank data=Input groups=1000 out=rank_out; 
    var var_to_rank; 
    ranks Rank_val; 
run;

data seventy_five;
    set rank_out;
    if rank_val>750;
run;

More on Rank: http://documentation.sas.com/?docsetId=proc&docsetTarget=p0le3p5ngj1zlbn1mh3tistq9t76.htm&docsetVersion=9.4&locale=en

Upvotes: 0

user2877959
user2877959

Reputation: 1792

You could either read the value of the quantile in a macro variable to use in a subsequent if or where condition:

proc means data=test noprint; 
var count;
output out=quantile P75= / autoname;
run;

data _null_;
set quantile;
call symput('quantile',count_p75);
run;

data test;
set test;
where count > &quantile.;
run;

or you could use an SQL subquery

proc means data=test noprint; 
var count;
output out=quantile P75= / autoname;
run;

proc sql undo_policy=none;
create table test as
select *
from test
where count > (select count_p75 from quantile)
;
quit;

(Note that your question mentions the 95th quantile whereas your sample code mentions the 75th)

Upvotes: 1

Related Questions