Reputation: 315
I'm having an issue reading a CSV file into a SAS dataset without bringing each field with my import. I don't want every field imported, but that's the only way I can seem to get this to work. The issue is I cannot get SAS to read my data correctly, even if it's reading the columns correctly... I think part of the issue is that I have data above my actual column headers that I don't want to read in.
My data is laid out like so
somevalue somevalue somevalue...
var1 var2 var3 var4
abc abc abc abc
Where I want to exclude somevalue, only read in select var's and their corresponding data.
Below is a sample file where I've scrambled all the values in my fields. I only want to keep COLUMN H(8), AT(46) and BE(57)
Here's some code I've tried so far...
This was SAS generated from a PROC IMPORT. My PROC IMPORT worked fine to read in every field value, so I just deleted the fields that I didn't want, but I don't get the output I expect. The values corresponding to the fields does not match.
A) PROC IMPORT
DATAFILE="C:\Users\dip1\Desktop\TU_&YYMM._FIN.csv"
OUT=TU_&YYMM._FIN
DBMS=csv REPLACE;
GETNAMES=NO;
DATAROW=3;
RUN;
generated this in the SAS log (I cut out the other fields I didn't want)
B) DATA TU_&YYMM._FIN_TEST;
infile 'C:\Users\fip1\Desktop\TU_1701_FIN.csv' delimiter = ',' DSD lrecl=32767
firstobs=3 ;
informat VAR8 16. ;
informat VAR46 $1. ;
informat VAR57 $22. ;
format VAR8 16. ;
format VAR46 $1. ;
format VAR57 $22. ;
input
VAR8
VAR46 $
VAR57 $;
run;
I've also tried this below... I believe I'm just missing something..
C) DATA TU_TEST;
INFILE "C:\Users\fip1\Desktop\TU_&yymm._fin.csv" DLM = "," TRUNCOVER FIRSTOBS = 3;
LABEL ACCOUNT_NUMBER = "ACCOUNT NUMBER";
LENGTH ACCOUNT_NUMBER $16.
E $1.
REJECTSUBCATEGORY $22.;
INPUT ACCOUNT_NUMBER
E
REJECTSUBCATEGORY;
RUN;
As well as trying to have SAS point to the columns I want to read in, modifying the above to:
D) DATA TU_TEST;
INFILE "C:\Users\fip1\Desktop\TU_&yymm._fin.csv" DLM = "," TRUNCOVER FIRSTOBS = 3;
LABEL ACCOUNT_NUMBER = "ACCOUNT NUMBER";
LENGTH ACCOUNT_NUMBER $16.
E $1.
REJECTSUBCATEGORY $22.;
INPUT @8 ACCOUNT_NUMBER
@46 E
@57 REJECTSUBCATEGORY;
RUN;
None of which work. Again, I can do this successfully if I bring in all of the fields with either A) or B), given that B) includes all the fields, but I can't get C) or D) to work, and I want to keep the code to a minimum if I can. I'm sure I'm missing something, but I've never had time to tinker with it so I've just been doing it the "long" way..
Here's a snippet of what the data looks like
A(1) B(2) C(3) D(4) E(5) F(6) G(7) ABCDEFGHIJ ABCDMCARD 202020 4578917 12345674 457894A (blank) CRA INTERNALID SUBCODE RKEY SEGT FNM FILEDATE CREDITBUR 2ABH123 AB2CHE123 A28O5176688 J2 Name 8974561
Upvotes: 0
Views: 213
Reputation: 51621
With a delimited file you need to read all of the fields (or at least all of the fields up to the last one you want to keep) even if you do not want to keep all of those fields. For the ones you want to skip you can just read them into a dummy variable that you drop. Or even one of the variables you want to keep that you will overwrite by reading from a later column.
Also don't model your DATA step after the code generated by PROC IMPORT. You can make much cleaner code yourself. For example there is no need for any FORMAT or INFORMAT statements for the three variables you listed. Although if VAR8 really needs 16 digits you might want to attach a format to it so that SAS doesn't use BEST12. format.
data tu_&yymm._fin_test;
infile 'C:\Users\fip1\Desktop\TU_1701_FIN.csv'
dlm=',' dsd lrecl=32767 truncover firstobs=3
;
length var8 8 var46 $1 var57 $22 ;
length dummy $1 ;
input 7*dummy var8 37*dummy var46 10*dummy var57 ;
drop dummy ;
format var8 16. ;
run;
You can replace the VARxx variable names with more meaningful ones if you want (or add a RENAME statement). Using the position numbers here just makes it clearer in this code that the INPUT statement is reading the 57 columns from the input data.
Upvotes: 1