nmpeterson
nmpeterson

Reputation: 49

How to read this multi-format data in SAS?

I have an odd dataset that I need to import into SAS, splitting the records into two tables depending on formatting, and dropping some records altogether. The data is structured as follows:

c Comment line 1
c Comment line 2
t lines init
a 'mme006'   M   8   99   15   '111 ME - RANDOLPH ST'
  path=no
    dwt=0.01  42427  ttf=1  us1=3  us2=0
    dwt=#0   42350  ttf=1  us1=1.8  us2=0  lay=3
    dwt=>0  42352  ttf=1  us1=0.5  us2=18.13
    42349  lay=3
a 'mme007'   M   8   99   15   '111 ME - RANDOLPH ST'
  path=no
    dwt=+0  42367  ttf=1  us1=0.6  us2=0
    dwt=0.01  42368  ttf=1  us1=0.6  us2=35.63 lay=3
    dwt=#0  42369  ttf=1  us1=0.3  us2=0
    42381  lay=3

Only the lines beginning with a, dwt or an integer need to be kept.

For the lines beginning with a, the desired output is a table like this, called "lines", which contains the first two non-a values in the row:

 name   | type
--------+------
 mme006 | M
 mme007 | M

For the dwt/integer rows, the table "itins" would look like so:

 anode | dwt  | ttf | us1 | us2   | lay
 ------+------+-----+-----+-------+-----
 42427 | 0.01 |   1 | 3.0 |  0.00 |
 42350 | #0   |   1 | 1.8 |  0.00 |   3
 42352 | >0   |   1 | 0.5 | 18.13 | 
 42349 |      |     |     |       |   3       <-- line starting with integer
 42367 | +0   |   1 | 0.6 |  0.00 |
 42368 | 0.01 |   1 | 0.6 | 35.63 |   3
 42369 | #0   |   1 | 0.3 |  0.00 |
 42381 |      |     |     |       |   3       <-- line starting with integer

The code I have so far is almost there, but not quite:

data lines itins;
  infile in1 missover;
  input @1 first $1. @;
      if first in ('c','t') then delete;
      else if first='a' then do;
        input name $ type $;
        output lines; end;
      else do;
        input @1 path=$ dwt=$ anode ttf= us1= us2= us3= lay=;
        if path='no' then delete;
        output itins; end;

The problems:

What am I doing wrong?

Upvotes: 0

Views: 87

Answers (1)

Joe
Joe

Reputation: 63434

DEQUOTE() will remove matched quotation marks.

Your problem with dwt is that you'll need to tell it what informat to use; so if dwt is four long, :$4. instead of just $.

However, anode is a problem. The solution I came up with is:

data lines itins;
  infile in1 missover;
  input @1 first $1. @;
      if first in ('c','t') then delete;
      else if first='a' then do;
        input name $ type $;
        output lines; end;
      else do;
        input @1 path= $ @;
        if path='no' then delete;
        else do;
            if substr(_infile_,5,1)='d' then do;
                input dwt= :$12. ttf= us1= us2= us3= lay=;
                anode=input(scan(dwt,2,' '),best.);
                dwt=scan(dwt,1,' ');
                output itins; 
            end;
            else do;
                input @5 anode 5. lay=;
                output itins;
            end;
        end;
    end;

run;

Basically, check for plan first; then if it's not a plan row, check for the 'd' in dwt. If that's present, read in a line like that, incorporating anode into dwt and then splitting it off later. If it's not present, just read in anode and lay.

If dwt can have widths other than 2-4 such that it might need to be shorter, then this probably won't work, and you'll have to explicitly figure out the position of anode to read it in properly.

Upvotes: 1

Related Questions