videorama17
videorama17

Reputation: 25

Specify format with proc import sas

I'm trying to import some data from a .txt file into WPS (which works with SAS language). Everythings worked perfectly until I realized that for one column/variable, some of my observations were cut. For example in the column/variable Type, my 5 first observations in the original file were:

RAN
FG
MAJOFG
ARBA
ARBV

but in my data set in sas, here's the data I get:

RAN
FG
MAJ
ARB
ARB

So I thought about using a format during the import to make sas always take the $8. format. Can it work with proc import? Here is my code:

proc import out = sasuser.frais
  datafile="~/NAS/M/XXX.txt"
  dbms=dlm replace;
  delimiter=";";
  getnames=yes;
  datarow=2;
run;

I also tried to use data infile but for some reason it moves my lines of data (the beginning of the second line starts at the end of the first one, probably because most of the time my last 4 columns/variables are empty). And I have more than 20 variables, most of which are properly imported by sas (in terms of format I mean), so if I could avoid naming a format for each of them, it would be great!

Upvotes: 1

Views: 20740

Answers (3)

videorama17
videorama17

Reputation: 25

My problem is that I plan to use this importation code for multiple files and the order of the observations for type is random. So it's totally possible that for 10 or 100 row, the observations have a length of 3 and then one observation of 6. But even if I don't know the order, I know the differents values possible (with a max length of 6). So guessingtows wouldn't work in my case. But I found a way to correct my code with the link provided by John Doe (https://communities.sas.com/t5/Base-SAS-Programming/Text-File-Import-force-the-correct-Informat-and-format/td-p/281701) I took the data generated by the proc import and modified it for the one variable which needed changes. Thank you all!

Upvotes: 0

Richard
Richard

Reputation: 27508

Try using the GUESSINGROWS statement:

Proc IMPORT … ;
  …
  GUESSINGROWS = 100;
run;

When using a DATA step with an INPUT statement, and some rows are incomplete, your INFILE statement should specify the MISSOVER option. This will help you create a data set where each row corresponds to each row of the data file.

  INFILE … FIRSTOBS=2 DLM=';' DSD MISSOVER;
  length v1-v20 $30;
  INPUT v1-v20;

During the data step implicit loop step, any variables at the end of the data line without corresponding input values will be set to missing, and the input statement will not attempt to look for more values on the next line.

Upvotes: 2

user667489
user667489

Reputation: 9569

You may be able to get acceptable results by tweaking the guessingrows parameter rather than specifying an informat, e.g.:

data _null_;
  file "%sysfunc(pathname(work))\test.txt";
  infile cards;
  input;
  put _infile_;
  cards;
RAN
FG
MAJOFG
ARBA
ARBV
;
run;

proc import out = example
  datafile="%sysfunc(pathname(work))\test.txt"
  dbms=dlm replace;
  delimiter=";";
  getnames=yes;
  datarow=2;
  guessingrows=3;
run;

If you change this from 3 to 2, proc import only looks at the first 2 data rows when guessing an appropriate length for the variable RAN and truncation occurs for subsequent rows with longer values.

Upvotes: 0

Related Questions