Reputation: 59
I am working on an assignment where I need to identify the most frequent number across a range of variables. If there is a tie between two number, I also need SAS to return the highest value of the two most frequent number.
Using this answer (https://communities.sas.com/t5/General-SAS-Programming/Find-most-frequent-response-across-multiple-variables/td-p/269774), I know how to identify the most frequent number if there isn't a tie between two number. I now only need SAS to return the highest number if there is a tie. I think the problem arises in the last line before the 'run'-statement.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
run;
data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=whichn(Count, of _t[*]);
run;
Upvotes: 1
Views: 213
Reputation: 1394
Well, I have to say, The code you give is a very excellent example for whichn
usage. And I will also praise the usage of array _t
, really nice thought!
For the question itself, here is my answer.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
run;
data want;
set have;
array x x1-x5;
array y y1-y5;
do i = 1 to dim(x);
y[i] = count(catx('@',of x[*]),cats(x[i]));
end;
count = max(of y[*]);
do i = 1 to dim(x);
if y[i] = count then highest = highest <> input(x[i],best.);
end;
drop y: i;
run;
The assignment of y[i]
assumed that x1
to x5
are between 0 and 9. If not, there should be some more restrict:
y[i] = count('@'||catx('@',of x[*]),catx('@','',x[i]));
Upvotes: 1
Reputation: 27508
WhichN
will return the index of only the first (left to right) occurrence, so when there is a MODE
tie you will not get the highest.
You can compute highest mode and count of mode at frequency bin update time.
data have;
input id (x1-x5) ($CHAR2. +1);
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 02 01 02 01
5 01 02 03 04
;
data want;
set have;
label
hmode_n = 'Mode (count)'
hmode = 'Mode (highest)'
;
array x x1-x5;
array bins[00:99] _temporary_; * freq table for two digit numbers;
do index = 1 to dim(x);
if missing(x[index]) then continue;
value = input(x[index],2.);
bins[value] + 1;
if bins[value] > hmode_n then do;
hmode_n = bins[value];
hmode = value;
end;
else
if bins[value] = hmode_n and value > hmode then do;
hmode = value;
end;
end;
call missing(of bins(*));
drop index value;
run;
Upvotes: 2
Reputation: 12884
First, adjust your input dataset so that all values are numeric rather than character:
data have;
input id 1 x1 x2 x3 x4 x5;
datalines;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 . 03 .
4 02 01 02 01
5 01 02 03 .04
;
run;
Next, transpose the data by id
. This will make it easier to work with. Once it is in a long format, you can more easily feed the data into procs to handle the calculations for you.
proc transpose data=have
out=have2(rename=(col1 = value))
name=var;
by id;
var x1-x5;
run;
proc rank
can allow you to grab what you need.
proc rank data=have2
out=want
ties=high
;
by id;
var value;
ranks rank;
run;
proc sort data=want;
by id rank;
run;
proc univariate
is also an option to get your statistics of interest.
proc univariate data=have2;
by id;
id var;
var value;
run;
Upvotes: 1