Reputation: 1035
This seems incredibly basic, but I simply can't find the right informat in SAS to read in the kind of data I have, which looks like this:
9 Bittersweet #FD7C6E (253, 124, 110) 48 1949
10 Black #000000 (0,0,0) 8 1903
I need to read in the values in parentheses into three separate numeric variables, and there's no informat I can find that simply "reads in numeric characters until it encounters a non-numeric character." The file is not completely comma-separated, more's the pity (whoever "designed" this file format should be shot, dead, buried, resurrected, and shot again!) The problem with the data in parentheses is that sometimes there is a space after a comma, and sometimes not. I've gotten the first number and the first set of characters after the number read in via column input, since the #
is always in column 32. I've read in the six-digit hex value (just using character there).
Here is my MWE:
Data crayons;
Infile 'path\crayons.dat' MISSOVER;
Input crayon_number
color_name $ 4-31
hex_code $ 33-38 @42
red 3. @','
green 3. @','
blue 3. @')'
pack_size
year_issued
year_retired;
Run;
The Bittersweet
line is read in correctly, but not the Black
line. (year_retired
is blank for both of these - I'm not concerned about that.) In the Black
line, I get the hex_code
variable correctly, but nothing after that.
So I guess the central question is this: how do I read in an integer of varying length that is guaranteed NOT to contain a comma, particularly when it is immediately followed by a comma?
Perhaps at a higher level: where can I go to find these sorts of things out? I have these questions about reading in dirty data, and I don't know where to go to find out. The SAS Language Reference is woefully inadequate for this, in my experience. If data fits into their neat little boxes, you're good to go. Anything outside of that, and their reference is useless.
Thank you very much for your time!
Upvotes: 0
Views: 653
Reputation: 9109
I would use list input with delimiters=' (,)'
Data crayons;
infile cards dlm=' (,)' missover;
Input crayon_number
color_name &$28.
hex_code $
red
green
blue
pack_size
year_issued
year_retired;
list;
cards;
9 Bittersweet #FD7C6E (253, 124, 110) 48 1949
10 Black #000000 (0,0,0) 8 1903
Upvotes: 2
Reputation: 21294
Another option is to read it in as a character as a whole field and use SCAN() later on.
Since you mention that the position is fixed, it sounds like you're reading a fixed width type file?
red = scan(orig_var, 1, "(,)");
green = scan(orig_var, 2, "(,)");
blue = scan(orig_var, 3, "(,)");
Upvotes: 0