user6037890
user6037890

Reputation: 71

Understanding difference between informat and format & how does_EFIERR_ work

457 data WORK.CC_2 ;
458 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
459 infile 'C:\Documents and Settings\DASC\Desktop\SUGI05_CC_1.csv' delimiter = ','
MISSOVER
459! DSD lrecl=32767 firstobs=2 ;
460 informat q_1 $5. ;
461 informat q2_6 best32. ;
462 informat q7_9 $5. ;
463 informat q8_1 best32. ;
464 informat q8_3 $5. ;
475 format q_1 $5. ;
476 format q2_6 best12. ;
477 format q7_9 $5. ;
478 format q8_1 best12. ;
479 format q8_3 $5. ;
489 format check_77 $5. ;
490 input
491 q_1 $
492 q2_6
493 q7_9 $
494 q8_1
495 q8_3 $
506 ;
507 if _ERROR_ then call symput('_EFIERR_',1); /* set ERROR detection macro        variable */
508 run;

Can someone help me undertand how does informat and format work here? Also, I am not sure if I understand the macro EFIRR

Upvotes: 0

Views: 6477

Answers (3)

Tom
Tom

Reputation: 51611

I would word the answer a little differently than the others have.

The code is creating a macro variable _EFIERR_ that it initializes to 0 and then sets to 1 if there are any errors in the data step.

The INFORMAT tells SAS how to read the text characters in the file and convert them into the values that SAS stores. So $5. says to just store the characters read and BEST32. says to convert the text into a number.

The FORMAT tells SAS how to display the values when it prints them back to characters. So $5. says to print using 5 characters. The BEST12. says to use the best format for displaying a number in 12 characters. So integers that are less than 12 characters are just written normally. If it would take more than 12 characters to print the number then it will be printed using scientific notation.

This looks to be code generated by PROC IMPORT. I am not a fan of how PROC IMPORT generates the data step code.
1) The code does not explicitly define the variable types and lengths. Instead it depends on the side effect of the INFORMAT statement being the first place that the variable is referenced. So Q_1 gets defined as character with a length of 5 since it is using $5. informat.

2) It attaches formats to character variables. This does nothing productive since a character variable of length 5 will print in 5 spaces whether you have attached a format to it or not. But it can lead to problems when combining data from multiple sources. If you read two CSV files and one happens to have Q_1 with length $8 and the second has Q_1 with length $5. If you set them together in that order the new variable will be length $8 but will have the format $5. attached to it. So when you print the values they could be truncated.

3) The $'s are not needed in the INPUT statement since the program has already defined the variable via the INFORMAT statement.

Upvotes: 0

desmond.carros
desmond.carros

Reputation: 362

Informats are the data types of the columns present in the existing file

Formats are the data types that SAS defines in order to ease itself with the data.

It looks that it is the copied code from the SAS Log file and EFIERR denotes any of the reading errors existing in the file. It is automatically invoked macro from the Import Wizard . Basically EFIERR will check that if the data has any missing elements and will then re check that if the code has Missover statement to assist that missing element or not. It will also check for the header names, the formatting of the data, the sequencing of the data etc. and will act as a quality indicator for any data that you import using import wizard.

Upvotes: 0

DomPazz
DomPazz

Reputation: 12465

INFORMAT describes how the data is presented in the text file.

FORMAT describes how you want SAS to present the data when you look at it. Remember, formats do not change the underlying data, just how it is printed for input into your gray matter computer.

This looks like it comes from PROC IMPORT. It uses that macro to detect if an error occurred while reading in the file. If there is one, then it gives you the super helpful error message "An Error Occurred" (or something like that).

You can delete those _EFIERR_ lines from your program without side effects.

Upvotes: 2

Related Questions