B K
B K

Reputation: 59

How can I split a dataset into multiple datasets based on a column (which consists of groups) in sas

For example for code looks like this:

data work.code;
input code_num $9. qty ;
datalines;
123456789 49
123456789 384
123456789 37
123456789 485
123456780 34
123456780 567
123456780 23
123456780 543
123456788 21
123456788 876
123456788 54
123456788 987
;
run;

I would like to break up this dataset into multiple datasets based on the code_num variable:

data code_num_1
123456789 49
123456789 384
123456789 37
123456789 485

data code_num_2
123456780 34
123456780 567
123456780 23
123456780 543

and etc. The amount of codes will vary in the future.

I tried applying the call execute function but cannot pass numbers through it...maybe a do loop?

Upvotes: 1

Views: 1251

Answers (2)

Tom
Tom

Reputation: 51621

No need to split the dataset to work with part of the data. Just use a WHERE statement.

proc surveyselect data=code ..... ;
  where code_num = "123456789";
  ...
run;

If the data is sorted (or indexed) you can frequently just use a BY statement to treat each group separately.

proc surveyselect data=code ..... ;
  by code_num ;
  ...
run;

Upvotes: 1

Stu Sztukowski
Stu Sztukowski

Reputation: 12944

Adapting this answer from SASnrd, you can use a hash table:

https://sasnrd.com/sas-split-dataset-by-group/

data _null_;
   if(_N_ = 1) then do;
      if(0) then set code;   
                           
      dcl hash h(dataset:"code(obs=0)", multidata:'Y');   
         h.definekey(all:'Y');                                     
         h.definedata(all:'Y');                                    
      h.definedone();                                           
   end;

   do until(last.code_num);                                     
      set code;                                             
      by code_num;
      h.add();                                                  
   end;

   id+1;
   out = cats('code_num_', id);

   h.output(dataset:out);                                   
   h.clear();                                                   
run;

The only difference is that we're creating a unique ID for each dataset name rather than using the by-group. Be sure that code_num is sorted in some logical order. You can use proc sort first or the notsorted option if it's always in the right order.

Upvotes: 2

Related Questions