Reputation: 307
I used following code to automatic split datasets within one library.
The following are my codes.
%macro split(sourcelib=,source=,result=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:numb
from mytables;
%let numb=&numb.; /*give a number to datasets in the library*/
select memname
into :memname1-:memname&numb.
from mytables;
quit;
%do i=1 %to &numb.;
proc sql noprint;
create table tmp&i as
select distinct date_l_, _ric
from &source.&&memname&i;
select count(*)
into :obs
from work.tmp&i.;
%let obs=&obs.; /*read the variable 'date_l_' and '_ric' in each dataset*/
select date_l_, _ric, catx("_", "&result.", substr(_ric, 1, 13), date_l_)
into :date_l_1-:date_l_&obs., :ric1-:ric&obs., :setname1-:setname&obs.
from work.tmp&i;
quit;
%end;
data
%do j = 1 %to &obs.; /*set rules for separated dataset*/
&&setname&j
%end;
;
set
%do i=1 %to &numb.;
&source.&&memname&i
%end;
;
select;
%do j = 1 %to &obs.;
when(_ric = "&&ric&i" and date_l_ = &&date_l_&i) output &&setname&j;
%end;
end;
%mend;
%split(sourcelib='DATA',source=DATA.,result=AXP.);
However, I faced to the error as shown following:
According to Problem note: If SAS has to acquire memory in order to process a direct access bound library, and memory has already been exhausted, error messages might occur in the SASLOG.
Since I have around 100 dataset and contrain various data and RIC (variable name), it is impossible to split the dataset manually. In this case, how could I improve my code to improve this code?
Upvotes: 0
Views: 531
Reputation: 51621
The key line in your photograph of the SAS log is the one that says:
NOTE: Table WORK.TMP4 created, with 17329 rows and 2 columns.
You cannot create seventeen thousand datasets in one data step. Why would you even want to do that? You will either need to do this in multiple data steps or better still find a different solution to your original problem.
Upvotes: 0
Reputation: 12944
This is a disk space problem. The "I/O" dataset error tends to occur not because of memory issues, but because of disk space. Your dataset has a lot of repeating values, which is good for you in terms of compression. Three things you want to do:
1. Set the column lengths to only what is needed
This can be done for each variable with attrib
or length
statements. A way I've found useful is using the %squeeze()
macro on the initial dataset, saving those attributes to a separate empty dataset, then using create table like
to retain those attributes without having to run %squeeze()
again. For example:
/* Shrink the dataset */
%squeeze(everything, everything_squeezed);
/* Save a copy of all these attributes in an empty dataset */
proc sql noprint;
create table everything_attribs like everything_squeezed;
quit;
This can be useful if your full dataset is updated or overwritten regularly, losing your attributes. Use everything_attribs
as the first table in your set
statements.
2. Use the compress
dataset option
Compressing your dataset can save incredible amounts of space. SAS uses two algorithms: one is good for character variables, one is good for numeric. In your case, try both and see which gives the best results.
%do i=1 %to &numb.;
proc sql noprint;
create table tmp&i(compress=yes) as
select distinct date_l_, _ric
from &source.&&memname&i;
select count(*)
into :obs
from work.tmp&i.;
%let obs=&obs.; /*read the variable 'date_l_' and '_ric' in each dataset*/
select date_l_, _ric, catx("_", "&result.", substr(_ric, 1, 13), date_l_)
into :date_l_1-:date_l_&obs., :ric1-:ric&obs., :setname1-:setname&obs.
from work.tmp&i;
quit;
%end;
data
%do j = 1 %to &obs.; /*set rules for separated dataset*/
&&setname&j(compress=yes)
%end;
;
set
%do i=1 %to &numb.;
&source.&&memname&i
%end;
;
select;
%do j = 1 %to &obs.;
when(_ric = "&&ric&i" and date_l_ = &&date_l_&i) output &&setname&j;
%end;
end;
%mend;
3. Check how much memory is being used
Turn on the system option fullstimer
to get a detailed view of the amount of memory being used for each step. If too much memory is being used, you will need to do different approaches that use less total memory.
For example, you are outputting everything in a single data step. You could convert that into multiple individual data steps rather than one large one.
Upvotes: 1