Reputation: 351
In the following code, how could I keep only the observations superior to the 95th quantile?
data test;
input business_ID $ count;
datalines;
'busi1' 2
'busi1' 10
'busi1' 4
'busi2' 1
'busi3' 2
'busi3' 1
;
run;
proc sort data = test;
by descending count;
run;
I don't know how to cleanly stock the quartile and then re-use it with an if condition.
Thanks
Edit : I can determine the quantile with this code :
proc means data=test noprint;
var count;
output out=quantile P75= / autoname;
run;
But how can I relate to it in the Test
dataset so that I can select every observations above that quantile?
Upvotes: 1
Views: 829
Reputation: 1396
User2877959's solution is solid. Recently I did this with Proc Rank. The solution is a bit 'work around-y', but saves a lot of typing.
proc rank data=Input groups=1000 out=rank_out;
var var_to_rank;
ranks Rank_val;
run;
data seventy_five;
set rank_out;
if rank_val>750;
run;
More on Rank: http://documentation.sas.com/?docsetId=proc&docsetTarget=p0le3p5ngj1zlbn1mh3tistq9t76.htm&docsetVersion=9.4&locale=en
Upvotes: 0
Reputation: 1792
You could either read the value of the quantile in a macro variable to use in a subsequent if
or where
condition:
proc means data=test noprint;
var count;
output out=quantile P75= / autoname;
run;
data _null_;
set quantile;
call symput('quantile',count_p75);
run;
data test;
set test;
where count > &quantile.;
run;
or you could use an SQL subquery
proc means data=test noprint;
var count;
output out=quantile P75= / autoname;
run;
proc sql undo_policy=none;
create table test as
select *
from test
where count > (select count_p75 from quantile)
;
quit;
(Note that your question mentions the 95th quantile whereas your sample code mentions the 75th)
Upvotes: 1