Peetrius
Peetrius

Reputation: 203

Numbered range lists for character data in SAS

I'm trying to create variables Cap1 through Cap6. I'm not sure how to have read them as character data. My code is:

DATA Capture;
    INFILE '/folders/myfolders/sasuser.v94/Capture.txt' DLM='09'x  DSD MISSOVER FIRSTOBS=2;
    INPUT Sex $ AgeGroup $ Weight Cap1 - Cap6 $;
RUN;

And my issue is Cap1 through Cap5 are interpreted as numerical data. How do I solve this?

Upvotes: 2

Views: 136

Answers (2)

Dirk Horsten
Dirk Horsten

Reputation: 3845

Indeed,

I would also expect this input statement to work as you did, but it does not. Putting a $ after Cap1 does not resolve it either, as this log shows.

26             INPUT Sex $ AgeGroup $ Weight Cap1 $ - Cap6 $;
                                                    _
                                                    22
ERROR 22-322: Expecting a name.  

You can solve it

by assigning a format to your variables before reading them, for instance format Cap1 - Cap6 $2.;

To test it,

I included the data in the source file, i.e. using datalines

DATA Capture;
    INFILE datalines DLM='09'x  DSD missover FIRSTOBS=1;
    format Sex $1. AgeGroup $9. Weight 8.2 Cap1 - Cap6 $2.;
    INPUT Sex AgeGroup Weight Cap1 - Cap6;
    datalines;
M   1-5 24.5    11  12  13  14  15  16
M   6-10    34.2    21  22  23  24  25  26
;
proc print;
proc contents;
RUN;

How to understand this:

SAS was originally created as a programming language for non-developers (i.c. statisticians) who rather don't care about data formats, so SAS does a lot of guess work for you (just like VBA if you don't use option explicit).

So, the first time you mention a variable name in a data step, SAS ads a variable to the Program Data Vector (PDV) with an apropriate type (numeric or charater) and length, but this is guess work.

For instance: as the first student in the test dataset CLASS included in the standard instalation of SAS is male,

data WORK.CLASS;
    set sasHelp.CLASS;
    select (sex);
        when ('M') gender = 'male';
        when ('F') gender = 'female';
        otherwise  gender = 'unknown';
    end;
run;

results in truncating 'female' to four positions: enter image description here

You can correct that by instructing sas to add the variable to the PDV beforehand.

For a character variable,

  • format myName $20.; and
  • length myName $20.; are equivalent and
  • informat myName $20.; is also about the same.

(The storry becomes more complex with user defined formats, though.)

For numerics, there is a huge difference:

  • length mySize 8.; preserves 8 bytes in the PDV for mySize
  • format mySize 8.; tells SAS to print or display mySize with up to 8 digits and no decimals
  • informat mySize $20.; tells SAS a expect 8 digits without decimals when reading mySize.

Numericals can only have certain lengths, depending on the operatin system. On windowns

  • 8. is the default and corresponds to a double on most databases
  • 4. corresponds to a float
  • 3. is the minimum, which I use for booleans

Formats can be very different

  • format mySize 8.3; tells SAS tot print mySize with 8 characters, including 3 decimals for the fraction (which leaves room for up to 4 decimals before the decimal dot if it has a positive value. Less decimals will be printed to display larger numbers)
  • format mySize 8.3; tells SAS tot read mySize assuming the last 3 decimals are the fraction, so 12345678 will be interpreted as 12345.678

Then there are special formats to read and write dates, times and so on and user defined value and picture formats, but that lead me too far.

Upvotes: 2

Joe
Joe

Reputation: 63424

Your issue is simple: you are using a variable list, but you aren't applying the $ to the whole variable list! You need ( ) around the list and the modifier to apply it to the whole list.

See:

DATA Capture;
    INFILE datalines DLM=' '  DSD;
    INPUT Sex $ AgeGroup $ Weight (Cap1 - Cap6) ($);
datalines;
M 18-34 135 A B C D E F
F 35-54 115 G H I J K L
;;;;
RUN;

Upvotes: 3

Related Questions