mebby
mebby

Reputation: 25

Why does SAS skip an entire row of data values due to missing value?

When I run the following code the third observation is not output. Why does SAS omit the third observation?

data info;
    input Gender $ Age Height Weight;
    datalines;
    M 45 72 149
    F  64 62
    M 61 72 271
    F 29 73 125
    M 16 65 178
    ;
Run;

title "Listing of Dataset Demographics";

proc print data=info;
run;

Upvotes: 1

Views: 823

Answers (2)

Tom
Tom

Reputation: 51566

Lines of text do not have "observations". They just have lines.

It didn't skip any of the lines of data. It just used two lines for the second observation because the first of the lines only had values for 3 of the 4 variables the INPUT statement requested.

This behavior is what SAS calls the flowover option of the INFILE statement. This allows you to have more than one line of text to represent the data for a single observation without having to be too persnickety about which fields you insert the line breaks between across the different observations of data.

If you don't want it to have to go hunt for the next field on the next line of text then make sure every variable has a value in the text lines. You can represent missing values by using a period for either numeric or character variables.

So use something like this:

data info;
  input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62 .
M 61 72 271
. 29 73 125
M 16 65 178
;

When using flowover you can insert as many extra line breaks as you want as long as each new observation starts on a new line. Like this

data info;
  input Gender $ Age Height Weight;
datalines;
M 45 72 
149
F 64 
62  .
M 
61 72 271
F 29 73 125
M 16 65 178
;

If you want SAS to just give up when a there are no more values on the line use the flowover option on the infile statement.

data info;
  infile datalines flowover;
  input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62 
M 61 72 271
F 29 73 125
M 16 65 178
;

There is also the older missover option, but you would normally never want that as it will set values at the end of the line that too short for an explicit INFORMAT width to missing instead of just use the number of characters that are available.

PS Don't indent lines of data. That will just make the code harder to read and the diagnostic messages about invalid data values harder to interpret. To make it easier don't intend the DATALINES (aka CARDS) statement line either. That will also make it clearer the data step definition ends where the lines of data starts and prevent you from accidentally inserting other statements for the data step after the data.

Upvotes: 0

Reeza
Reeza

Reputation: 21264

Defaults will get you, the default in SAS is FLOWOVER, so if a record is missing it looks for it on the next line. You want MISSOVER or TRUNCOVER instead.

Your log tells you this happened with the following note:

 NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

This works:

data info;
    infile cards truncover;
    input Gender $ Age Height Weight;
    datalines;
    M 45 72 149
    F  64 62
    M 61 72 271
    F 29 73 125
    M 16 65 178
    ;
Run;

More details are available in the Example 2 in the documentation here.

Specifically:

When you omit the MISSOVER option or use FLOWOVER (which is the default), SAS moves the input pointer to line 2 and reads values for TEMP4 and TEMP5 (variables it cannot find). The next time the DATA step executes, SAS reads a new line which, in this case, is line 3. This message appears in the SAS log:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

Upvotes: 2

Related Questions