Reputation: 3928
Is it possible to avoid writing a file at every datastep in SAS?
For instance, I start with two SAS data sets called have1
and have2
on my HD. I then do these simple SAS data steps:
data have3;
merge have1 have2; by id;run;
data have3; set have3;
if id='5' then delete;run;
proc sort data=have3; by id;run;
proc summary data=have3;
by id;
output out=have4
sum(expense)=expense;
run;
Can I do the first 2 data steps and the proc sort
in memory and then write on the HD have4? [In fact I merge using hash objects].
have3
is a big data set so if I can avoid writing the data on my HD at every data steps that would great.
Upvotes: 1
Views: 95
Reputation: 194
There is another more primitive, but simple, way to clean up the various data sets your program produces. Proc datasets will not prevent files from being created, but you can use it to delete any data that has outlived its usefullness. This example will delete have1 and have2.
proc datasets;
delete have1 have2;
run;
Upvotes: 0
Reputation: 63424
The broad answer to your question is that yes, you can avoid some steps; you can use a view
to avoid writing out datasets, in some cases. You also could use a memory library (ramlib
) to define a library in memory rather than on a hard disk.
In your specific case, it seems like some of the processing is unnecessary, in any event.
data have3;
merge have1 have2; by id;run;
data have3; set have3;
if id='5' then delete;run;
proc sort data=have3; by id;run;
proc summary data=have3;
by id;
output out=have4
sum(expense)=expense;
run;
could be
data have3;
merge have1 have2;
by id;
if id='5' then delete;
run;
proc summary data=have3;
class id;
output out=have4 sum(expense)=expense;
run;
Class
doesn't require sorting, and works effectively like by
in this case. There's no reason to separate the merge and the delete (even more efficient might be to use where
statements on the incoming datasets).
You could even define have3
as a view, if you wanted.
data have3 /view=have3; *other code is the same;
You can't have a preexisting dataset named have3
as well in this case or it will fail.
Upvotes: 4