Reputation: 717
I have inherited a 37,000+ line SAS script, not including additional inherited modules. Given a beginner knowledge of SAS and a moderate knowledge of Python, I was wondering if this was considered excessive as it seems to me and recommendations on a way to layer and break apart a typical SAS script.
Thanks
Upvotes: 1
Views: 584
Reputation: 8513
I believe that it's a better practice to ONLY put REPETITIVE blocks of code (or blocks of code that require macro processing) into %macro blocks. If a comment alone can suffice to describe the purpose of a section of code then most often that is a better approach.
Putting code into macro blocks that have no macro-related processing just obfuscates simple SAS code. That's like wrapping an existing function, into your own user-defined function, without adding any additional features to it.
To have 37k LOC IS definitely excessive (for a single program) and I'm sure the majority of it will be repetitive so @RWill's answer is still very valid. I think a better approach though, would be to separate the code into multiple files (each file with a specific purpose), which can then be called with the %include statement. This way you can avoid the need for having many (potentially) unnecessary macros and still make it more self-documenting.
Cheers Rob
Upvotes: 3
Reputation: 28391
@RWill makes good suggestions. Sectioning with %Macro blocks is very helpful for runnning the code a piece at a time. Using Enterprise Guide (4.3 is what I use) also allows for breaking the code into sections. Code nodes can be created and linked so that the program can be run a piece at a time or the whole branch can be run. Also, the output data sets are visually connected to the program blocks.
Upvotes: 2
Reputation: 949
This is a common issue with SAS, as many people who write SAS programs have a background in analysis and not efficient programming. The first thing that I would suggest is placing a block of the code within a macro definition, then testing calling the macro and ending the program with an ENDSAS statement before it executes the rest of the code. This will allow you to run diagnostics to determine what that block of code does. Once you understand what the code is doing, you can decide if you want to break this code up into further macro definitions to have each macro perform one logically cohesive set of instructions.
Once into the program you might find similar code repeated, with a few differences such as a choice of parameters, data set names or variables. If you find this type of repetition, then you can probably design a macro that uses parameters to generalize the code. Afterward you can replace the repetitive instances with the macro call, changing the parameters as needed.
Similarly, with that long of a script I would expect that you are going to find a place where inserting a DO loop will help add structure, whether that be within a macro definition or within a DATA step.
Sometimes, though, a SAS program just requires that many lines of code. Breaking it up into logical steps with macro definitions will help its readability and maintenence, but it will do little to improve efficiency. In those cases, the least you should do is to write a comment section near the top that briefly describes the purpose of the program, then add comments throughout the code to explain what a particular block of code is doing.
Here are two caveats for this technique, though. First, if the original program defines macro variables, then you might have an issue on the scope of these variables if you define them within a macro. Macro variables defined outside of a macro definition are global in scope; but unless you specify that they should be global, then macro variables defined within a macro definition (and macro parameters set when the macro is called) will be local and will not be available once the macro executes. A similar issue is that more than one macro variable can have the same name but their scope is different, which can result in unexpected values when they are returned. The second warning is that if you use a macro definition to comment out large blocks of code so that you can test later sections of the code in a different run, then make sure to save any necessary datasets to a permanent location; otherwise they will be deleted and not be available for the later testing.
Good luck.
Upvotes: 4