Melville
Melville

Reputation: 1

Importing multiple text (txt) files into SAS (files have variable attributes in first 2 rows)

I am trying to import multiple text files into SAS. The peculiarity of the data is that the first row has the labels for some of the variables and the second row has text indicating type of some of the variables. The third row has the variable names. I was intending to use a macro to read the files as the first 7 variables have the same names. I am not sure how to programmatically handle the variable attributes in the files. Please suggest how I could do this.

The code so far:

    %macro text2sas(filenam=);

  proc import datafile="../&filenam..txt"
              out="&filenam"
              dbms=dlm replace ;
              delimiter = '09'x;
              getnames=no;
              datarow=1;
              guessingrows=max;
  run;
%mend text2sas;

%text2sas(filenam=convdat);
%text2sas(filenam=tratdat);

The data for convdat.txt looks like this:

"Dance retail:" "Dummy measurement completed successfully?" "Dramatic measurements?"    "Maximal travel :"  "Velocity time at start:"   "Mean velocity at start:"   "Maximal velocity at end:"  "Velocity time iinterval:"  "Mean velocity interval:"   "Crain Dp:" 
                            date    string  string  number  number  number  number  number  number  number  
RELAXT  RAIN    PLUCK   RAPPLE  VRAT    GROSS   PANGLE  "Straint"   "Etramp"    "Crumpa"    "Cafin" "Cafinat"   "Cafinab"   "Cafinavr"  "Cafinap"   "cafinal"   
X5980B00099 "CF"    G0001001    1234    "Vlapa1"    1   "Crt appoi" "10-May-2010"   "1" "1" ""  ""  ""  ""  ""  ""  ""  
X5980B00099 "CF"    G0001002    1234    "Vlapa1"    1   "Crt appoi" "13-May-2010"   "1" "1" ""  ""  ""  ""  ""  ""  ""  
X5980B00099 "CF"    G0001003    1234    "Vlapa1"    1   "Crt appoi" "19-may-2010"   "1" "1" ""  ""  ""  ""  ""  ""  ""  
X5980B00099 "CF"    G0001004    1234    "Vlapa1"    1   "Crt appoi" "26-may-2010"   "1" "1" "0.45"  "0.55"  "0.98"  "0.76"  "0.98"  "0.12"  "5.77"  

Data for tratdat looks like this:

                            "Arbitrary carpets" "Household items"   "Garage material"   "Sundry data (everything else)" "Vehicle number"    "Strains"   "ITM"   "Finals"    "Dreadspan" "Printers"  "Comment 1" "comment 2" "Grapple"   "Drops" "Triangles" 
                            boolean boolean boolean boolean boolean boolean boolean boolean boolean boolean string  boolean boolean boolean boolean 
RELAXT  RAIN    PLUCK   RAPPLE  VRAT    GROSS   PANGLE  "Ant"   "App"   "Cro"   "BRon"  "Dramas"    "Slacks"    "CRAT"  "Frob"  "Rilo"  "Ph7jj" "P10rt" "Irup"  "GLk2"  "Dap3"  "Oreta" 
X5980B00099 "GB"    G0001001    1234    "Vlapa1"    1   "Pangolin train"    ""  "checked"   ""  "checked"   ""  "checked"   "checked"   ""  ""  ""  ""  ""  ""  ""  ""  
X5980B00099 "GB"    G0001002    1234    "Vlapa1"    1   "Pangolin train"    ""  ""  ""  "checked"   ""  "checked"   "checked"   ""  ""  ""  ""  ""  ""  ""  ""  
X5980B00099 "GB"    G0001003    1234    "Vlapa1"    1   "Pangolin train"    "checked"   ""  ""  "checked"   ""  "checked"   "checked"   ""  ""  ""  ""  ""  ""  ""  ""  
X5980B00099 "GB"    G0001004    1234    "Vlapa1"    1   "Pangolin train"    "checked"   ""  ""  "checked"   ""  "checked"   "checked"   ""  ""  ""  ""  ""  ""  ""  ""  

Upvotes: 0

Views: 457

Answers (2)

Tom
Tom

Reputation: 51566

Looks like the first three lines have the LABEL, TYPE and NAME for the columns. So read that first and use the information to generate code to read the actual lines of data.

Something like this:

data headers ;
  length row col 8 type $32 value $200 ;
  infile file2 dsd dlm='09'x truncover length=ll column=cc ;
  do type='LABEL','TYPE','NAME';
    row+1;
    do col=1 by 1 until(cc>ll);
      input value @ ;
      if not missing(value) then output;
    end;
    input;
  end;
  stop;
run;
proc sort; by col row; run;
proc transpose data=headers out=meta(drop=_name_) ;
  by col;
  id type ;
  var value;
run;

Which for that second file should get data like:

Obs    col    NAME      LABEL                             TYPE

  1      1    RELAXT
  2      2    RAIN
  3      3    PLUCK
  4      4    RAPPLE
  5      5    VRAT
  6      6    GROSS
  7      7    PANGLE
  8      8    Ant       Arbitrary carpets                boolean
  9      9    App       Household items                  boolean
 10     10    Cro       Garage material                  boolean
 11     11    BRon      Sundry data (everything else)    boolean
 12     12    Dramas    Vehicle number                   boolean
 13     13    Slacks    Strains                          boolean
 14     14    CRAT      ITM                              boolean
 15     15    Frob      Finals                           boolean
 16     16    Rilo      Dreadspan                        boolean
 17     17    Ph7jj     Printers                         boolean
 18     18    P10rt     Comment 1                        string
 19     19    Irup      comment 2                        boolean
 20     20    GLk2      Grapple                          boolean
 21     21    Dap3      Drops                            boolean
 22     22    Oreta     Triangles                        boolean

Which you might use to generate code like:

data want ;
  infile file2 dsd dlm='09'x truncover firstobs=4 ;
  input
    RELAXT :$20.
    RAIN :$5.
    PLUCK :$20.
    RAPPLE 
    VRAT  :$20.
    GROSS 
    PANGLE :$40.
    Ant :$1.
    App :$1.
    Cro :$1.
    BRon :$1.
    Dramas :$1.
    Slacks :$1.
    CRAT :$1.
    Frob :$1.
    Rilo :$1.
    Ph7jj :$1.
    P10rt :$50.
    Irup :$1.
    GLk2 :$1.
    Dap3 :$1.
    Oreta :$1.
  ;
  label
    Ant ="Arbitrary carpets"
    App ="Household items"
    Cro ="Garage material"
    BRon ="Sundry data (everything else)"
    Dramas ="Vehicle number"
    Slacks ="Strains"
    CRAT ="ITM"
    Frob ="Finals"
    Rilo ="Dreadspan"
    Ph7jj ="Printers"
    P10rt ="Comment 1"
    Irup ="comment 2"
    GLk2 ="Grapple"
    Dap3 ="Drops"
    Oreta ="Triangles"
  ;
run;

Upvotes: 0

Joe
Joe

Reputation: 63424

The ultimate input will involve telling SAS to go to row 3, but as Reeza notes, you will lose your metadata if you just skip to Datarow=4.

I recommend parsing the file in a preprocessing step, and converting that metadata into input statements. This may be complicated, but it shouldn't be too bad... it is however outside the scope of a StackOverflow answer.

You can look into my presentations Writing Code With Your Data and Documentation Driven Programming (co-author) to see what kind of things you can do as far as writing the input statements. You don't have exactly what either of these expect, but you can input those first few lines using data step input and then transpose that dataset to a more useful format.

Upvotes: 1

Related Questions