ThunderFlash
ThunderFlash

Reputation: 522

Reading arrays from .npy files into Fortran 90

I am using Python to generate some initial data, in the form of 2D arrays such as 'X', and then using Fortran to do some calculations with them. Initially when the array sizes were around 10,000 x 10,000, np.savetxt worked fine in terms of speed. But once I start increasing the dimensions of the arrays, the speed slows significantly for savetxt. So I tried np.save and this results in much faster saving speeds but the file is saved in the .npy format. How would I go about reading such a file in Fortran to reconstruct the original array? From what I've read, binary usually results in the lowest space consumption and fastest speeds.

In Fortran 90,

open(10,file='/home/X.npy')

read(10,*)X

I get the following error : Fortran runtime error: Bad real number in item 1 of list input

edit : In python I do this,

import numpy as np 
x = np.linspace(-50,50,10)
y = np.linspace(-50,50,10) 
X,Y = np.meshgrid(x,y) 
np.save('X',X) 

Then in Fortran I do this,

program read_unformatted_binary_from_python
    use iso_c_binding
    implicit none

    integer, parameter :: DPR = selected_real_kind(p=15)
    character(len=*), parameter :: filename = 'X.npy'

    integer, parameter :: N = 10
    real(DPR), allocatable, dimension(:, :) :: X

    allocate(X(N, N))


    open(40, file=filename, status='old', access='stream', form='unformatted')
    read(40) X
    close(40)

    write(*,*) X

end program read_unformatted_binary_from_python

The output starts with 1.8758506894003703E-309 1.1711999948023422E+171 5.2274167985502976E-037 8.4474009688929314E+252 2.6514123210345660E+180 9.9215260506210473E+247 2.1620996769994603E+233 7.5805790251297605E-096 3.4756671925988047E-152 6.5549091408576423E-260 -50.000000000000000 -38.888888888888886 -27.777777777777779 and so on

This error doesn't occur when using the .bin format, only when using np.save which results in npy.

Upvotes: 5

Views: 1791

Answers (1)

CAZT
CAZT

Reputation: 111

Vladimir F is correct that you want 'stream' access of a 'raw binary' file in Fortran. Here's a MWE:

Python

import numpy as np
A = np.random.rand(10000, 10000)
print(A.sum())
A.tofile('data.bin')

Fortran

program read_unformatted_binary_from_python
    use iso_c_binding
    implicit none

    integer, parameter :: DPR = selected_real_kind(p=15)
    character(len=*), parameter :: filename = 'data.bin'

    integer, parameter :: N = 10000
    real(DPR), allocatable, dimension(:, :) :: dat

    allocate(dat(N, N))

    open(40, file=filename, status='old', access='stream', form='unformatted')
    read(40) dat
    close(40)

    write(*,*) sum(dat)

end program read_unformatted_binary_from_python

My Fortran example is perhaps a bit longer than necessary, as I use many different systems and compiler suites, and also hate large static arrays (I am a Fortran user after all).

I coded this up real quick with Python 2.7.x, Numpy 13.x, and gfortran from Homebrew GCC 6.3.0_1 on a MacBook Pro, but this should work on all systems.

UPDATE: Special care needs to be taken with the array shapes and sizes here. If dat is allocated to be larger than what is in the file, than the streaming read should try to fill the entire array, hit the EOF symbol, and issue an error. In Python, the np.fromfile() method will read up to EOF then return a 1D array with the appropriate length, even if A was originally multidimensional. This is because a raw binary has no metadata and is simply a contiguous string of bytes from RAM.

As a result, the following lines in Python produce the same file:

A = np.random.rand(10000, 10000)
A.tofile('file.whatever')
A.ravel().tofile('file.whatever')
A.reshape((100, 1000, 1000)).tofile('file.whatever')

And that file can be read in and reshaped as:

B = np.fromfile('file.whatever').reshape(A.shape)
B = np.fromfile('file.whatever').reshape((100, 1000, 100, 10))
# or something like
B = np.fromfile('file.whatever') # just a 1D array
B.resize(A.shape)  # resized in-place

In Fortran it is easy to read an entire raw file using stream access without knowing its size ahead of time, though you'll obviously need some kind of user input in order to reshape the data:

program read_unformatted_binary_from_python
    use iso_c_binding
    implicit none

    integer, parameter :: DPR = selected_real_kind(p=15)
    character(len=*), parameter :: filename = 'data.bin'
    integer :: N = 10000, bytes, reals, M
    real(DPR), allocatable :: A(:,:), D(:, :), zeros(:)
    real(DPR), allocatable, target :: B(:)
    real(DPR), pointer :: C(:, :)

    allocate(A(N, N))

    open(40, file=filename, status='old', access='stream', form='unformatted')

    read(40) A
    write(*,*) 'sum of A', sum(A)

    inquire(unit=40, size=bytes)
    reals = bytes/8
    allocate(B(reals))

    read(40, pos=1) B
    write(*,*) 'sum of B', sum(B)

    ! now reshape B in-place assuming the user wants the first dimension 
    ! (which would be the outer dimension in Python) to be length 100
    N = 100
    if(mod(reals, N) == 0) then
         M = reals/N
         call C_F_POINTER (C_LOC(B), C, [N, M])
         write(*, *) 'sum of C', sum(C)
         write(*, *) 'shape of C', shape(C)
    else
         write(*,*) 'file size is not divisible by N!, did not point C to B'
    end if

    ! now reshape B out-of-place with first dimension length 9900, and
    ! pad the result so that there is no size mismatch error  
    N = 9900
    M = reals/N
    if(mod(reals, N) > 0) M=M+1

    allocate(D(N, M))
    allocate(zeros(N), source=real(0.0, DPR))
    D = reshape(B, [N, M], pad=zeros)

    write(*,*) 'sum of D', sum(D)
    write(*,*) 'shape of D', shape(D)

    ! obviously you can also work with shape(A) in fortran the same way you
    ! would use A.shape() in Python, if you already knew that A was the
    ! correct shape of the data
    deallocate(D)
    allocate(D, mold=A)
    D = reshape(B, shape(A))
    write(*,*) 'sum of D', sum(D)
    write(*,*) 'shape of D', shape(D)

    ! or, just directly read it in, skipping the whole reshape B part
    read(40, pos=1) D
    write(*,*) 'sum of D', sum(D)

    close(40)

end program read_unformatted_binary_from_python

Upvotes: 3

Related Questions