Tristan Sun
Tristan Sun

Reputation: 299

SAS: How to split data into subsets with names in sequence.

I am using SAS to split dataset into subsets with the following naming convention: sub001, sub002, sub003 .... My dataset looks like

ID   var1
AA   ...
AA   ...
AB   ...
AC   ...
...  ...
AZ   ...
...  ...
ZZ   ...

I could do this manually.

data sub001 sub002.....sub676;
set data;
if id='AA' then output sub001;
.....
if id='ZZ' then output sub676;
run;

But is there any easy way to do this ? I assume %macro or %do could help here.

Upvotes: 0

Views: 465

Answers (1)

Stu Sztukowski
Stu Sztukowski

Reputation: 12899

If you are looking to split them all into a bunch of datasets, you can use a macro to do so. This is one of my favorite little tricks in macro programming:

Step 1: Read all distinct IDs into a space-separated macro variable

proc sql noprint;
    select distinct strip(upcase(id) )
    into :all_ids separated by ' '
    from have;
quit;

Step 2: Loop over the total words in &all_ids, filter have on each value of &all_ids

options nonotes;

%macro split;
    %do i = 1 %to %sysfunc(countw(&all_ids) );
        data sub_&i;
            set have;
            where upcase(ID) = "%scan(&all_ids, &i)";
        run;
    %end;
%mend;
%split;

options notes;

Because we are looping over the total number of words in the space-delimited macro variable &all_ids, there is a 1-1 correspondence between the value of &i and the position of each word in &all_ids. For example:

 &i  |        &all_ids      
LOOP   WORD NUMBER      WORD           
 1         1             AA
 2         2             AB
 3         3             AC

This isn't the most efficient way to do it since we are writing a new dataset to the disk every time, but unless your resources are extremely limited or you have a monstrous dataset in the hundreds and hundreds of gigabytes or more, it should complete quickly. If you need a way to do it with less I/O, I can help you with that as well.

Upvotes: 0

Related Questions