Reputation: 1
I tried to import text file in sas
with the following code
PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
OUT= outdata
DBMS=dlm
REPLACE;
delimiter='09'x;
GETNAMES=YES;
RUN;
But I am getting import unsuccessful because text file has period for missing data
this is what i got in SAS
log
NOTE: Invalid data for class_size in line 455 16-17.
455 CHAR 454.34.8.32.17.NA.23.125.12.188 31
ZONE 3330330303303304403323330332333
NUMR 454934989329179E1923E125912E188
sl_no=454 school=34 iq=8 test=32 ses=17 class_size=. meanses=23.125 meaniq=12.188 _ERROR_=1 _N_=454
how can load this text file in SAS
Upvotes: 0
Views: 1972
Reputation: 51566
Did you create that text file from R? That package has a nasty habit of putting text values of NA for numeric values into text files. If you are the one that created the file the you might check if the system you are using has a way to not put the NA into the file to begin with. In a delimited file missing values are normally represented by having nothing for the field. So the delimiters are right next to each other. For SAS you can use a period to represent a missing value.
I wouldn't bother to use PROC IMPORT to read a delimited file. Just write a data step to read the file. Since it looks like your file only has six variables and they are all numeric the code is trivial.
data outdata;
infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
dsd dlm='09'x firstobs=2 truncover
;
input sl_no school iq test ses class_size meanses meaniq ;
run;
One way to deal with the NA text in the input file is to replace them with periods. Since all of the fields are numeric you can do that easily because you don't have to worry about replacing real text that just happens to have the letter A after the letter N. Here is trick using the _INFILE_
automatic variable that you can use to make the change on the fly while reading the file.
data outdata;
infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
dsd dlm='09'x firstobs=2 truncover
;
input @;
_infile_=tranwrd(_infile_,'NA','.');
input sl_no school iq test ses class_size meanses meaniq ;
run;
Upvotes: 1
Reputation: 27498
You are getting the NOTE:
because of the NA
value in the class_size
field.
What you presume are periods (.
) are actually tabs (hex code 09
). Look under the period to confirm, the ZONE
is 0
and NUMR
9
. 09
is the tab character.
Proc IMPORT
guesses each fields data type based on looking at the first few rows (default is 20 rows) of a text file. Your file contained only numbers the 20 rows, so the procedure guessed class_size
was numeric.
There a couple of courses of action.
NOTES
and know the places where NA
occurred you will have a missing value in your data set.GUESSINGROWS=MAX;
statement to your import code
class_size
will be guessed as character and you might have to do another step to convert the values to numeric (a step in which the non-digit values get converted to missing values)NA
with a period (.
). The dot marks a missing value during IMPORT. The IMPORT step will have no incongruities to LOG about.Converting a field
PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
DBMS=dlm REPLACE OUT=work.outdata;
delimiter='09'x;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;
data want;
set outdata (rename=(class_size=class_size_char));
class_size = input (class_size_char, ?? best12.);
drop class_size_char;
run;
Upvotes: 0