Reputation: 14370
say I have a SAS table tbl
which has a column col
. This column col
holds different values say {"a","s","d","f",...}
but one is MUCH more present than the other (say "d"
). How can I do a select only this value
It would be something like
data tbl;
set tbl;
where col eq "the most present element of col in this case d";
run;
Upvotes: 1
Views: 82
Reputation: 1304
I would use PROC SQL for this.
Here's an example that gets "d" into a macro variable and then filters the original dataset, as requested in your question.
This will work even if there is a multi-way tie for the most frequent observation.
data tbl;
input col: $1.;
datalines;
a
a
b
b
b
b
c
c
c
c
d
d
d
;run;
proc sql noprint;
create table tbl_freq as
select col, count(*) as freq
from tbl
group by col;
select quote(col) into: mode_values separated by ', '
from tbl_freq
where freq = (select max(freq) from tbl_freq);
quit;
%put mode_values = &mode_values.;
data tbl_filtered;
set tbl;
where col in (&mode_values.);
run;
Note the use of QUOTE(), which is needed to wrap the values of col in quotation marks (omit this if col is a numeric variable).
Upvotes: 1
Reputation: 63424
One of many methods to accomplish this...
data test;
n+1;
input col $;
datalines;
a
b
c
d
d
d
d
e
f
g
d
d
a
b
d
d
;
run;
proc freq data=test order=freq; *order=freq automatically puts the most frequent on top;
tables col/out=test_count;
run;
data want;
set test;
if _n_ = 1 then set test_count(keep=col rename=col=col_keep);
if col = col_keep;
run;
To put this into a macro variable (see comments):
data _null_;
set test_count;
call symput("mvar",col); *put it to a macro variable;
stop; *only want the first row;
run;
Upvotes: 3