Reputation: 49
I have an odd dataset that I need to import into SAS, splitting the records into two tables depending on formatting, and dropping some records altogether. The data is structured as follows:
c Comment line 1
c Comment line 2
t lines init
a 'mme006' M 8 99 15 '111 ME - RANDOLPH ST'
path=no
dwt=0.01 42427 ttf=1 us1=3 us2=0
dwt=#0 42350 ttf=1 us1=1.8 us2=0 lay=3
dwt=>0 42352 ttf=1 us1=0.5 us2=18.13
42349 lay=3
a 'mme007' M 8 99 15 '111 ME - RANDOLPH ST'
path=no
dwt=+0 42367 ttf=1 us1=0.6 us2=0
dwt=0.01 42368 ttf=1 us1=0.6 us2=35.63 lay=3
dwt=#0 42369 ttf=1 us1=0.3 us2=0
42381 lay=3
Only the lines beginning with a
, dwt
or an integer need to be kept.
For the lines beginning with a
, the desired output is a table like this, called "lines", which contains the first two non-a
values in the row:
name | type
--------+------
mme006 | M
mme007 | M
For the dwt
/integer rows, the table "itins" would look like so:
anode | dwt | ttf | us1 | us2 | lay
------+------+-----+-----+-------+-----
42427 | 0.01 | 1 | 3.0 | 0.00 |
42350 | #0 | 1 | 1.8 | 0.00 | 3
42352 | >0 | 1 | 0.5 | 18.13 |
42349 | | | | | 3 <-- line starting with integer
42367 | +0 | 1 | 0.6 | 0.00 |
42368 | 0.01 | 1 | 0.6 | 35.63 | 3
42369 | #0 | 1 | 0.3 | 0.00 |
42381 | | | | | 3 <-- line starting with integer
The code I have so far is almost there, but not quite:
data lines itins;
infile in1 missover;
input @1 first $1. @;
if first in ('c','t') then delete;
else if first='a' then do;
input name $ type $;
output lines; end;
else do;
input @1 path=$ dwt=$ anode ttf= us1= us2= us3= lay=;
if path='no' then delete;
output itins; end;
The problems:
'mme006'
)#0 4236
and 0.01 42
, always 8 characters long, borrowing part of what should be in "anode".What am I doing wrong?
Upvotes: 0
Views: 87
Reputation: 63434
DEQUOTE() will remove matched quotation marks.
Your problem with dwt
is that you'll need to tell it what informat to use; so if dwt is four long, :$4.
instead of just $
.
However, anode is a problem. The solution I came up with is:
data lines itins;
infile in1 missover;
input @1 first $1. @;
if first in ('c','t') then delete;
else if first='a' then do;
input name $ type $;
output lines; end;
else do;
input @1 path= $ @;
if path='no' then delete;
else do;
if substr(_infile_,5,1)='d' then do;
input dwt= :$12. ttf= us1= us2= us3= lay=;
anode=input(scan(dwt,2,' '),best.);
dwt=scan(dwt,1,' ');
output itins;
end;
else do;
input @5 anode 5. lay=;
output itins;
end;
end;
end;
run;
Basically, check for plan first; then if it's not a plan row, check for the 'd' in dwt. If that's present, read in a line like that, incorporating anode into dwt and then splitting it off later. If it's not present, just read in anode and lay.
If dwt can have widths other than 2-4 such that it might need to be shorter, then this probably won't work, and you'll have to explicitly figure out the position of anode to read it in properly.
Upvotes: 1