Sathya Ih
Sathya Ih

Reputation: 1

Importing txt file in SAS

I tried to import text file in sas with the following code

PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt' 
OUT= outdata 
DBMS=dlm 
REPLACE; 
delimiter='09'x;
GETNAMES=YES; 
RUN;

But I am getting import unsuccessful because text file has period for missing data

this is what i got in SAS log

NOTE: Invalid data for class_size in line 455 16-17.
 
 455 CHAR  454.34.8.32.17.NA.23.125.12.188 31
     ZONE  3330330303303304403323330332333
     NUMR  454934989329179E1923E125912E188
 sl_no=454 school=34 iq=8 test=32 ses=17 class_size=. meanses=23.125 meaniq=12.188 _ERROR_=1 _N_=454

how can load this text file in SAS

Upvotes: 0

Views: 1972

Answers (2)

Tom
Tom

Reputation: 51566

Did you create that text file from R? That package has a nasty habit of putting text values of NA for numeric values into text files. If you are the one that created the file the you might check if the system you are using has a way to not put the NA into the file to begin with. In a delimited file missing values are normally represented by having nothing for the field. So the delimiters are right next to each other. For SAS you can use a period to represent a missing value.

I wouldn't bother to use PROC IMPORT to read a delimited file. Just write a data step to read the file. Since it looks like your file only has six variables and they are all numeric the code is trivial.

data outdata;
   infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt' 
     dsd dlm='09'x firstobs=2 truncover
   ;
   input sl_no school iq test ses class_size meanses meaniq ;
run;

One way to deal with the NA text in the input file is to replace them with periods. Since all of the fields are numeric you can do that easily because you don't have to worry about replacing real text that just happens to have the letter A after the letter N. Here is trick using the _INFILE_ automatic variable that you can use to make the change on the fly while reading the file.

data outdata;
   infile '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt' 
     dsd dlm='09'x firstobs=2 truncover
   ;
   input @;
   _infile_=tranwrd(_infile_,'NA','.');
   input sl_no school iq test ses class_size meanses meaniq ;
run;

Upvotes: 1

Richard
Richard

Reputation: 27498

You are getting the NOTE: because of the NA value in the class_size field.

What you presume are periods (.) are actually tabs (hex code 09). Look under the period to confirm, the ZONE is 0 and NUMR 9. 09 is the tab character.

Proc IMPORT guesses each fields data type based on looking at the first few rows (default is 20 rows) of a text file. Your file contained only numbers the 20 rows, so the procedure guessed class_size was numeric.

There a couple of courses of action.

  • Do nothing. Read your log NOTES and know the places where NA occurred you will have a missing value in your data set.
  • or,
    Read the file as-is, but add GUESSINGROWS=MAX; statement to your import code
    • The mixed data type column class_size will be guessed as character and you might have to do another step to convert the values to numeric (a step in which the non-digit values get converted to missing values)
  • or,
    Edit the text file replacing all the NA with a period (.). The dot marks a missing value during IMPORT. The IMPORT step will have no incongruities to LOG about.

Converting a field

PROC IMPORT DATAFILE= '/home/u44418748/MSc Biostatistics with SAS/Datasets/school.txt'
  DBMS=dlm REPLACE OUT=work.outdata;

  delimiter='09'x;
  GETNAMES=YES; 
  GUESSINGROWS=MAX;
RUN;

data want;
  set outdata (rename=(class_size=class_size_char));
  class_size = input (class_size_char, ?? best12.);
  drop class_size_char;
run;

Upvotes: 0

Related Questions