invoketheshell
invoketheshell

Reputation: 3897

SAS: Most frequent value (Like a MODE) ties solved by recency?

I have data like this:

data mydata;
input ID $ Val $ Date;
datalines;
1  A  2010-12-01 
1  B  2010-12-03 
1  A  2010-12-04 
1  B  2010-12-08
2  X  2009-10-01 
2  X  2009-10-02 
2  Z  2009-10-03 
;
run;

I would like the mode returned where it exists. ID 1, however, doesn't have a true mode. In the case of ties where modes do not exist I would like the most recent Val to break the tie (as in id 1).

Desired OUTPUT:

ID Mode
1  B  
2  X  

I tried proc univariate (which only handles numeric modes, another problem) but this gives the dataset with mode null; which SAS has correct but is not the desired output. I would like to do this in a datastep.

CODE:

proc univariate data=mydata noprint;
class id;
var val;
output out=modetable mode=mode;
run;

OUTPUT:

ID Mode
1  
2  X

Upvotes: 1

Views: 1270

Answers (2)

invoketheshell
invoketheshell

Reputation: 3897

Here is proc sql solution I came up with although I like the selected solution better:

%macro modes(data, mode , tie , break, outset , lib );
proc sql;
create table &lib..&outset as
select &id, &mode
from (select &id, &mode, latest
   from(select &id, &mode, latest
       from(select &id, &mode, count(*) as n, &break.(&tie) as latest
           from &data
           where &mode is not null
           group by &id, &mode)
         group by &id
         having n = max(n))
    group by &id
    having latest= &break.(latest) )
; 
quit;
%mend modes;

%modes(data=mydata, mode=age , tie=somedateorvalue , break=max, outset=outtable , lib =mylib);
  • Tie : is the column that is used to break ties
  • break : should be min or max, if you want earliest or latest date or high or low values to break ties with

The rest should be self explanatory.

Upvotes: 0

Dirk Horsten
Dirk Horsten

Reputation: 3845

use IDgroup from proc means

An example of this statement can be fount in Identifying the Top Three Extreme Values with the Output Statistics

Let us extend the example data a little bit;

data myInput;
    infile datalines dsd delimiter='09'x;
    input 
        @1 ID 1. 
        @4 Val $1. 
        @7 Date yymmdd10.;
    format Date yymmdd10.;
    datalines;
2  X  2009-10-01
2  X  2009-10-02
2  Z  2009-10-03
3  C  2010-10-01
3  B  2010-10-03
3  A  2010-10-04
3  A  2010-12-01
3  B  2010-12-03
3  C  2010-12-04
;
run;

Now let us count the frequency and the last occurence of each ´Val´ for each ´ID´;

proc sql;
    create view myView as 
    select ID, Val, max(Date) as Date format=yymmdd10., count(*) as freq
    from myInput
    group by ID, Val;
run;

And finally, retain one Val for each ID, prefering the more frequent one and within equally frequent ones the most recent one;

proc means data=myView nway noprint;
   class ID;
   output out=myModes(keep= ID Mode)
          idgroup( max(freq Date) out[1] (Val)=Mode);
run;

proc print data=myModes;
run;

The result is;

ID  Mode
2   X
3   C

Upvotes: 1

Related Questions