DukeLuke
DukeLuke

Reputation: 315

SAS - Reading Raw/Delimited file

I'm having an issue reading a CSV file into a SAS dataset without bringing each field with my import. I don't want every field imported, but that's the only way I can seem to get this to work. The issue is I cannot get SAS to read my data correctly, even if it's reading the columns correctly... I think part of the issue is that I have data above my actual column headers that I don't want to read in.

My data is laid out like so

somevalue somevalue somevalue...
var1   var2   var3    var4
abc    abc    abc     abc

Where I want to exclude somevalue, only read in select var's and their corresponding data.

Below is a sample file where I've scrambled all the values in my fields. I only want to keep COLUMN H(8), AT(46) and BE(57)

Here's some code I've tried so far...

This was SAS generated from a PROC IMPORT. My PROC IMPORT worked fine to read in every field value, so I just deleted the fields that I didn't want, but I don't get the output I expect. The values corresponding to the fields does not match.

    A) PROC IMPORT
                DATAFILE="C:\Users\dip1\Desktop\TU_&YYMM._FIN.csv"
                OUT=TU_&YYMM._FIN
                DBMS=csv REPLACE;
                GETNAMES=NO;
                DATAROW=3;


RUN;

generated this in the SAS log (I cut out the other fields I didn't want)

 B) DATA TU_&YYMM._FIN_TEST;
infile 'C:\Users\fip1\Desktop\TU_1701_FIN.csv' delimiter = ',' DSD lrecl=32767
firstobs=3 ;
informat VAR8 16. ;
informat VAR46 $1. ;
informat VAR57 $22. ;
format VAR8 16. ;
format VAR46 $1. ;
format VAR57 $22. ;
input
VAR8
VAR46 $
VAR57 $;
run;

I've also tried this below... I believe I'm just missing something..

C) DATA TU_TEST;

INFILE "C:\Users\fip1\Desktop\TU_&yymm._fin.csv" DLM = "," TRUNCOVER FIRSTOBS = 3;

LABEL ACCOUNT_NUMBER = "ACCOUNT NUMBER";

LENGTH ACCOUNT_NUMBER $16.
       E $1.
       REJECTSUBCATEGORY $22.;

INPUT  ACCOUNT_NUMBER
       E
       REJECTSUBCATEGORY;
RUN; 

As well as trying to have SAS point to the columns I want to read in, modifying the above to:

  D) DATA TU_TEST;

INFILE "C:\Users\fip1\Desktop\TU_&yymm._fin.csv" DLM = "," TRUNCOVER FIRSTOBS = 3;

LABEL ACCOUNT_NUMBER = "ACCOUNT NUMBER";

LENGTH ACCOUNT_NUMBER $16.
       E $1.
       REJECTSUBCATEGORY $22.;

INPUT  @8 ACCOUNT_NUMBER
       @46 E
       @57 REJECTSUBCATEGORY;
RUN; 

None of which work. Again, I can do this successfully if I bring in all of the fields with either A) or B), given that B) includes all the fields, but I can't get C) or D) to work, and I want to keep the code to a minimum if I can. I'm sure I'm missing something, but I've never had time to tinker with it so I've just been doing it the "long" way..

Here's a snippet of what the data looks like

    A(1)        B(2)     C(3)         D(4)          E(5)     F(6)     G(7)
ABCDEFGHIJ    ABCDMCARD   202020    4578917      12345674    457894A    (blank)
   CRA      INTERNALID   SUBCODE     RKEY           SEGT      FNM       FILEDATE
CREDITBUR      2ABH123  AB2CHE123 A28O5176688        J2       Name       8974561

Upvotes: 0

Views: 213

Answers (1)

Tom
Tom

Reputation: 51621

With a delimited file you need to read all of the fields (or at least all of the fields up to the last one you want to keep) even if you do not want to keep all of those fields. For the ones you want to skip you can just read them into a dummy variable that you drop. Or even one of the variables you want to keep that you will overwrite by reading from a later column.

Also don't model your DATA step after the code generated by PROC IMPORT. You can make much cleaner code yourself. For example there is no need for any FORMAT or INFORMAT statements for the three variables you listed. Although if VAR8 really needs 16 digits you might want to attach a format to it so that SAS doesn't use BEST12. format.

data tu_&yymm._fin_test;
  infile 'C:\Users\fip1\Desktop\TU_1701_FIN.csv'
         dlm=',' dsd lrecl=32767 truncover firstobs=3 
  ;
  length var8 8 var46 $1 var57 $22 ;
  length dummy $1 ;
  input 7*dummy var8 37*dummy var46 10*dummy var57 ;
  drop dummy ;
  format var8 16. ;
run;

You can replace the VARxx variable names with more meaningful ones if you want (or add a RENAME statement). Using the position numbers here just makes it clearer in this code that the INPUT statement is reading the 57 columns from the input data.

Upvotes: 1

Related Questions