Reputation: 315
I'm curious about how SAS handles informats and input statements with informats. What's the "order of operations" of these statements? I included an example snippet from a program that SAS EG Import Wizard generated.
Disclaimer: I rarely use EG Import Wizard, but my employer has asked that we use EG when possible, i.e. creating new programs, so I was curious how this functionality worked.
Data:
TimeStamp
01/01/2019 12:00:00 AM
Example EG Generated Code:
data Input;
length TimeStamp 4;
format TimeStamp mmddyy10.;
informat TimeStamp mmddyy10.;
...some infile statement...
input TimeStamp : Best32;
TimeStamp = DatePart(TimeStamp);
run;
The above example is the code EG generated, but I'm curious as to why all these statements were generated. I'm also unsure of why SAS used the : Best32
informat with the input statement when my Import Wizard states DateTime18.
Historically, using BASE SAS, I've just used:
Example of #1:
Data Test;
...infile...;
input @1 TimeStamp DateTime18.;
...format...;
run;
Example of #2:
Data Test2;
...infile...;
informat TimeStamp DateTime18.;
input TimeStamp;
...format...;
run;
Is Example #1 just shorthand of Example #2? If so, why is EG generating the extra steps? In the EG Generated Code - how is the informat
statement not overriding the input
statements informat
Upvotes: 0
Views: 3869
Reputation: 51566
The INFORMAT
and FORMAT
statement are not executable. So you can place them anywhere in the data step (excluding the side effect of forcing a type to be defined for a variable that the compiler hasn't typed yet). Note this also means that if you assign multiple FORMATs (informats) to the same variable the last one will be what is used.
When the INPUT statement executes any explicit informat specification you have included in the INPUT statement itself will override any informat associated with the variable. Note again that if the variable has not already been typed by the compiler then how the INPUT statement uses the variable will cause a type to be selected for the variable.
So for the most predictable results you should define your variables instead of letting SAS guess based how they first appear. You can define them using the LENGTH
statement or ATTRIB
statement. Or define them by pulling in an existing dataset with SET
,MERGE
and other statements. Then the order of the INPUT, FORMAT and INFORMAT statements will not matter.
You would have to ask SAS why the Enterprise Guide Wizard works the way it works. My understanding is that for some files (like Excel spreadsheets) it will convert the data into a text file and upload the text file it generated. So I assume that EG generated the DATE and TIME values as the raw number of days or number of seconds and that is why it reads the value using the normal numeric informat instead of a date or time informat. I assume it attaches an INFORMAT to the date and time variables so that the metadata in the dataset definition are populated with something that matches the format that is attached.
As to why the they used the BEST32.
informat I have no idea. There is not really a BEST
informat in SAS so that is really just an alias for 32.
(or they could have used F32.
). The concept of "best" for an informat doesn't even really make sense. The BEST format is used to figure out for this particular number what is the best combination of digits to generate to approximate the value in a limited number of characters. For reading a string of characters into a number SAS just needs to read the digits and convert it to the number they represent. There is no selection of any "best" alternatives involved.
Upvotes: 2