Reputation: 23
I have a binary file of fixed length records created in MS SSIS which I need to read into SAS 9.4 64bit. Currently the file is read within a data step using this code:
data outputdata.(EOC=no
compress = yes
keep = a b c);
length a $4.;
length b 4.;
infile "&inputfile." obs= 999999999 lrecl=308 recfm=F;
input @5 a $4.
@9 b ib4.
@13 c rb4.
;
...
...
...
All variables are read correctly into the output dataset except c. c is a floating point number with 2dp, minimum value 0.00 and maximum value 99.99. In case it's useful, c starts its life off as a VB.Net Single value which is converted to binary using VB.Net's BitConverter.GetBytes(Single)
which returns a 4-byte array. This array is then written to the binary record.
From what I can tell from my research on the subject rb4.
is the correct way to read a 4-byte floating point ('real'?) value from a binary record in SAS so presumably the issue lies in how to then format that value so that it appears correctly in the output dataset. I've tried the following:
format c rb2.2;
format c 2.2;
format c 4.;
along with variations on the values of the formats statements (e.g. format c 5.;
etc). None of the formats I've tried have resulted in anything close to the correct values; most result in numbers in scientific form such as 17E9.
c is a new addition to the binary file and is the only 'real' variable contained within it so I don't have an example to work from. I'm new to SAS and have inherited this project so there's a good chance the issue is something fairly fundamental!
Any guidance appreciated. Thanks
Upvotes: 1
Views: 1333
Reputation: 1319
Repeating my comment as an answer...
You should use FLOAT4. to read a value that was written by the VB.NET BitConverter.GetBytes(Single)
function. The RB4. informat reads four input bytes as if they are a truncated double-precision floating-point value, but the output of the VB.NET function is a single-precision floating-point value, aka a 'float', which is not the same thing.
The note on SAS's documentation page for the FLOAT format explains:
The FLOATw.d informat is useful in operating environments where a float value is not the same as a truncated double.
On the IBM mainframe systems, a four-byte floating-point number is the same as a truncated eight-byte floating-point number. However, in operating environments that use the IEEE floating-point standard, such as the IBM PC-based operating environments and most UNIX platforms, a four-byte floating-point number is not the same as a truncated double. Therefore, the RB4. informat does not produce the same results as FLOAT4. Floating-point representations other than IEEE might have this same characteristic. Values read with FLOAT4. typically come from some other external program that is running in your operating environment.
Upvotes: 1