Adrian Keister
Adrian Keister

Reputation: 1035

SAS Input: Integers Separated by Commas

This seems incredibly basic, but I simply can't find the right informat in SAS to read in the kind of data I have, which looks like this:

9  Bittersweet                 #FD7C6E  (253, 124, 110)  48 1949
10  Black                      #000000  (0,0,0)          8 1903

I need to read in the values in parentheses into three separate numeric variables, and there's no informat I can find that simply "reads in numeric characters until it encounters a non-numeric character." The file is not completely comma-separated, more's the pity (whoever "designed" this file format should be shot, dead, buried, resurrected, and shot again!) The problem with the data in parentheses is that sometimes there is a space after a comma, and sometimes not. I've gotten the first number and the first set of characters after the number read in via column input, since the # is always in column 32. I've read in the six-digit hex value (just using character there).

Here is my MWE:

Data crayons;
    Infile 'path\crayons.dat' MISSOVER;
    Input crayon_number
          color_name $ 4-31
          hex_code   $ 33-38 @42
          red   3. @','
          green 3. @','
          blue  3. @')'
          pack_size
          year_issued
          year_retired;
Run;

The Bittersweet line is read in correctly, but not the Black line. (year_retired is blank for both of these - I'm not concerned about that.) In the Black line, I get the hex_code variable correctly, but nothing after that.

So I guess the central question is this: how do I read in an integer of varying length that is guaranteed NOT to contain a comma, particularly when it is immediately followed by a comma?

Perhaps at a higher level: where can I go to find these sorts of things out? I have these questions about reading in dirty data, and I don't know where to go to find out. The SAS Language Reference is woefully inadequate for this, in my experience. If data fits into their neat little boxes, you're good to go. Anything outside of that, and their reference is useless.

Thank you very much for your time!

Upvotes: 0

Views: 653

Answers (2)

data _null_
data _null_

Reputation: 9109

I would use list input with delimiters=' (,)'

Data crayons;
   infile cards dlm=' (,)' missover;
   Input crayon_number
          color_name &$28.
          hex_code $   
          red   
          green 
          blue  
          pack_size
          year_issued
          year_retired;
   list;
   cards;
9  Bittersweet                 #FD7C6E  (253, 124, 110)  48 1949
10  Black                      #000000  (0,0,0)          8 1903

Upvotes: 2

Reeza
Reeza

Reputation: 21294

Another option is to read it in as a character as a whole field and use SCAN() later on.

Since you mention that the position is fixed, it sounds like you're reading a fixed width type file?

red = scan(orig_var, 1, "(,)");
green = scan(orig_var, 2, "(,)");
blue = scan(orig_var, 3, "(,)");

Upvotes: 0

Related Questions