Reputation: 37
I am trying to import several .txt files in to SAS. The values are separated by semi colons and each file has several million observations. The problem I have is when importing the file one of the variables gets the wrong formatting. The original value per observation is a random number (though all of approximately 20-25 digits) and for one obs might be something like 301185964728506014850593. When imported SAS then, for some reason, reads it as (numeric, correctly but not importantly) 3.0118596E23. How can i make SAS read exactly what is in the .txt file?
Searching for the documentation has not been fruitful at all, with very few additional options found. The code below is the standard code I use to import the files (using mixed=yes does not seem to help and expanding the guessingrows changes nothing exept when i have it set to max which just makes the whole system crash).
proc import DATAFILE="W:FILE1.txt" OUT=FILE1 DBMS=dlm replace; delimiter=';'; guessingrows=100; run;
When this is read i get the error message "ERROR: Import unsuccessful. See SAS Log for details." However the file is imported and i can see that all rows and variables are included. I simply want to change the format or possibly the variable length (call the variable ID) in the proc import stage. Is this possible?
Upvotes: 1
Views: 3356
Reputation: 27508
There is a SAS limit to the how large a numeric can be and still retain exact integer representation. (Based on binary limits of mantissa and exponent features in double precision floating point values)
From the "SAS Companion for Windows"
Significant Digits and Largest Integer by Length for SAS Variables under Windows Length Largest in Integer Significant Bytes Represented Exponential Digits Exactly Notation Retained ------ ----------------------- ----------- ----------- 3 8,192 213 3 4 2,097,152 221 6 5 536,870,912 229 8 6 137,438,953,472 237 11 7 35,184,372,088,832 245 13 8 9,007,199,254,740,992 253 15
There is no option for directly specifying the column formats with the IMPORT
Procedure.
You can recall the DATA Step source code that the procedure creates and modify that.
From the documentation File "Format-Specific Reference for the IMPORT and EXPORT Procedures" Delimited Files
Processing Delimited Files in SAS
When you use PROC IMPORT to read a comma-separated file, a tab-separated file, or other delimited file, the procedure does the following actions by default:
…
- creates a DATA step with an INPUT statement
- submits all of the code to the DATA step compiler, which, in turn, executes the code.
…
If you need to revise your code after the procedure runs, issue the RECALL command (or press F4) to the generated DATA step. At this point, you can add or remove options from the INFILE statement and customize the INFORMAT, FORMAT, and INPUT statements to your data.
So the steps will be
Proc IMPORT
that outputs zero rows.RECALL
command in the command bar (or menu Run/Recall Last Submit)where=(1=0))
INFORMAT
from best32.
to something like $32.
FORMAT
statementExample:
Create sample data set with too big integers and IMPORT
it
filename myfile temp;
data _null_;
file myfile;
put "one;two;three";
put "1;2;3";
put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
put "301185964728506014850593;301185964728506014850594;301185964728506014850595";
run;
proc import
file=myfile
dbms=dlm
replace
out=myimport(where=(1=0) /* output limiter */
;
delimiter=';';
run;
Recall the SAS source code, edit it and resubmit
/**********************************************************************
* PRODUCT: SAS
* VERSION: 9.4
* CREATOR: External File Interface
* DATE: 07NOV19
* DESC: Generated SAS Datastep Code
* TEMPLATE SOURCE: (None Specified.)
***********************************************************************/
data WORK.MYIMPORT
/*(where=(1=0)) */ /* <------ remove limiter */
;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile MYFILE delimiter = ';' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat one $32. ; /* <-------- change informats */
informat two $32. ;
informat three $32. ;
/* format one best12. ;*/ /* <--------- remove format statements */
/* format two best12. ;*/
/* format three best12. ;*/
input
one
two
three
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Yields the data set
Obs one two three
1 1 2 3
2 301185964728506014850593 301185964728506014850594 301185964728506014850595
3 301185964728506014850593 301185964728506014850594 301185964728506014850595
4 301185964728506014850593 301185964728506014850594 301185964728506014850595
5 301185964728506014850593 301185964728506014850594 301185964728506014850595
6 301185964728506014850593 301185964728506014850594 301185964728506014850595
7 301185964728506014850593 301185964728506014850594 301185964728506014850595
Upvotes: 0
Reputation: 1000
To have a greater level of control when reading in a file of raw data, you should consider using a data step, e.g.:
data want;
infile datalines4 dlm = ';' dsd;
input
id : $char25.
v1 : $char10.
v2 : 8.
;
datalines4;
301185964728506014850593;abc;123
30118596472850601485059;abcd;1234
3011859647285060148505;abcde;12345
;;;;
This way you can specify the id to be character and so data will be held as presented in the file.
Upvotes: 1