Reputation: 471
I am trying to get a text file in SAS. The data does not have any headers. All I want is to remove duplicates based on column 3 values. The text file looks like-
P0780043,866.05,2200,3.79,140,1
P0780043,866.05,2300,3.84,140,1
P0780043,866.05,2300,3.84,140,1
P0780043,866.05,0000,3.89,140,1
I want the result to be-
P0780043,866.05,2200,3.79,140,1
P0780043,866.05,2300,3.84,140,1
P0780043,866.05,0000,3.89,140,1
I am using the code below-
%let flname1=D:\temp\wq_%sysfunc(today(),yymmddn8.).txt;
%put &=flname1;
data one;
infile "&flname1" dsd dlm=',';
input x1-x6;
proc sort data=one out=nodup nodupkey;
By x3;
run;
The code does not read the first column for some reason. I am not sure why this is happening. Its something probably very obvious but I am fairly new to SAS. Any help would be appreciated. Thanks!
Upvotes: 0
Views: 1053
Reputation: 63424
Tom's hit it on the nose; you have to tell SAS to read in variables as character, or it assumes they're numeric.
In your particular case, if you want to read it in without thinking about it, you could use PROC IMPORT
which will figure out what each column should be read in as with some degree of success; it has drawbacks (particularly if your data are mostly numeric but have the very occasional character value).
proc import file="&flname1." out=one dbms=csv replace;
getnames=no; *Instructs SAS not to treat the first row as variable names;
run;
This is something that is fairly common to use when you're going to be manually looking at the data and the data are fairly consistent; it's a bad idea to use it if you're running this in a production environment (particularly when you're not looking at the file each run) as some details of the file (particularly column lengths and formats) could change from run to run. It also generates code in the log you can copy/paste into your .sas file in place of the PROC IMPORT, if you want the infile read-in but would like SAS to produce the first pass so you don't have to type it all in.
Upvotes: 0
Reputation: 51566
Your problem is that the first column is character and your program is trying to read it as numeric. Either read the first column as character or read them all as character.
data one;
infile "&flname1" dsd dlm=',';
length x1 $8 ;
input x1-x6;
run;
proc sort data=one out=nodup nodupkey;
by x3;
run;
Upvotes: 1