offeltoffel
offeltoffel

Reputation: 2801

Reading fortran binary (streaming access) with np.fromfile or open & struct

The following Fortran code:

INTEGER*2 :: i, Array_A(32)
Array_A(:) = (/ (i, i=0, 31) /)

OPEN (unit=11, file = 'binary2.dat', form='unformatted', access='stream')
    Do i=1,32
        WRITE(11) Array_A(i)
    End Do 
CLOSE (11)

Produces streaming binary output with numbers from 0 to 31 in integer 16bit. Each record is taking up 2 bytes, so they are written at byte 1, 3, 5, 7 and so on. The access='stream' suppresses the standard header of Fortran for each record (I need to do that to keep the files as tiny as possible).

Looking at it with a Hex-Editor, I get:

00 00 01 00 02 00 03 00 04 00 05 00 06 00 07 00 
08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00
10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00
18 00 19 00 1A 00 1B 00 1C 00 1D 00 1E 00 1F 00

which is completely fine (despite the fact that the second byte is never used, because decimals are too low in my example).

Now I need to import these binary files into Python 2.7, but I can't. I tried many different routines, but I always fail in doing so.

1. attempt: "np.fromfile"

with open("binary2.dat", 'r') as f:
    content = np.fromfile(f, dtype=np.int16)

returns

[    0     1     2     3     4     5     6     7     8     9    10    11
    12    13    14    15    16    17    18    19    20    21    22    23
    24    25     0     0 26104  1242     0     0]

2. attempt: "struct"

import struct
with open("binary2.dat", 'r') as f:
    content = f.readlines()
    struct.unpack('h' * 32, content)

delivers

struct.error: unpack requires a string argument of length 64

because

print content
['\x00\x00\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00\x06\x00\x07\x00\x08\x00\t\x00\n', '\x00\x0b\x00\x0c\x00\r\x00\x0e\x00\x0f\x00\x10\x00\x11\x00\x12\x00\x13\x00\x14\x00\x15\x00\x16\x00\x17\x00\x18\x00\x19\x00']

(note the delimiter, the t and the n which shouldn't be there according to what Fortran's "streaming" access does)

3. attempt: "FortranFile"

f = FortranFile("D:/Fortran/Sandbox/binary2.dat", 'r')
print(f.read_ints(dtype=np.int16))

With the error:

TypeError: only length-1 arrays can be converted to Python scalars

(remember how it detected a delimiter in the middle of the file, but it would also crash for shorter files without line break (e.g. decimals from 0 to 8))

Some additional thoughts:

Python seems to have troubles with reading parts of the binary file. For np.fromfile it reads Hex 19 (dec: 25), but crashes for Hex 1A (dec: 26). It seems to be confused with the letters, although 0A, 0B ... work just fine.

For attempt 2 the content-result is weird. Decimals 0 to 8 work fine, but then there is this strange \t\x00\n thing. What is it with hex 09 then?

I've been spending hours trying to find the logic, but I'm stuck and really need some help. Any ideas?

Upvotes: 1

Views: 528

Answers (1)

Stanislav Ivanov
Stanislav Ivanov

Reputation: 1974

The problem is in open file mode. Default it is 'text'. Change this mode to binary:

with open("binary2.dat", 'rb') as f:
    content = np.fromfile(f, dtype=np.int16)

and all the numbers will be readed successfull. See Dive in to Python chapter Binary Files for more details.

Upvotes: 3

Related Questions