Anonymous
Anonymous

Reputation: 83

How to read a data file with some condition faster in Fortran?

I am trying to write down a Fortran subroutine for my code in order to read a data from a file (which is a huge data set on itself).The data file contains the Location (nx0,ny0,nz0) and the field related to that location (Bx,By,Bz). (Ex: lets say the range for nx0, ny0 and nz0 is from [-15,15]. so the number of rows will be 31*31*31=29791)

-15.00000       -15.00000       -15.00000      700.00000     -590.00000      100.00000
-15.00000       -15.00000       -14.00000     -110.00000     -570.00000      100.00000
-15.00000       -15.00000       -13.00000     -550.00000     -200.00000      100.00000
-15.00000       -15.00000       -12.00000     -540.00000     -230.00000      100.00000
-15.00000       -15.00000       -11.00000     -140.00000     -50.00000      100.00000
 .              .               .           .              .             .
 .              .               .           .              .             .
 .              .               .           .              .             .
 15.00000       15.00000         15.00000     140.00000       50.00000       100.000

What I want to do is to look for a specific location within my file (xi,yi and zi) and read the field related to that location then use it for further analysis. Not only the related field to the target position itself but also the surrounding field of that location (Like the three other side of the square around the target point).

   subroutine read_data(xi,yi,zi,Bxij,Byij)
   real*8,intent(in) :: xi,yi,zi !,time   
   real*8,intent(out) :: Bxij(4),Byij(4) !,Bzij(4)
   integer,parameter :: step = 1 ,cols = 6, rows = 29791  !!!15,000,000
   real,dimension(rows) :: nx0,ny0,nz0,Bx,By,Bz
   character*15 filein
   character*35 path_file

    path_file = '/home/mehdi/Desktop/'
    filein= 'test-0001' 
    open(7,file=trim(path_file)//filein, status='old',action='read')

xi_1 = xi +step
yi_1 = yi +step

do i = 1,rows
read(7,*) nx0(i),ny0(i),nz0(i),Bx(i),By(i),Bz(i) 
c
    if ( xi == nx0(i) .and. yi == ny0(i) .and.
 &  zi == nz0(i)) then
      Bxij(1) = Bx(i) 
      Byij(1) = By(i)
      cycle
    endif
c
    if ( xi == nx0(i) .and. yi_1 == ny0(i) .and.
 &  zi == nz0(i)) then
      Bxij(2) = Bx(i)
      Byij(2) = By(i)
      cycle
    endif
c        
    if ( xi_1 == nx0(i) .and. yi == ny0(i) .and.
 &  zi == nz0(i)) then
      Bxij(3) = Bx(i)
      Byij(3) = By(i)
      cycle
    endif
c
    if ( xi_1 == nx0(i) .and. yi_1 == ny0(i) .and.
 &  zi == nz0(i)) then
     Bxij(4) = Bx(i)
     Byij(4) = By(i)
     exit
    endif
c
    close(7) 
    enddo
end

I have done it this way but it is too slow. One of the most important things for me is the speed (which even for this small fraction of data set is really time consuming).

I know this slow mode is for the needs to read the whole data set each time in order to look for the target points. This subroutine is called couple times within the code and for the further steps the code is going to do the same thing over and over again, so it is time consuming.

How can I make this code work more efficiently?

Upvotes: 2

Views: 1041

Answers (1)

chw21
chw21

Reputation: 8140

Before I begin this answer, let me reiterate what I said in the comments to your question:

Do not underestimate how much data you can put into a single array. Reading once, and then having everything in memory is still the fastest way possible.

But let's assume that the data really gets too big.

Your main issue seems to be that you have to re-read all the data from the beginning until you find the value you're looking for. That takes the time.

If you can calculate which line of the data file the value you are interested in is, it might help to convert the file into an unformatted direct access file.

Here is an example code for the conversion. It's using Fortran 2008 features, so if your compiler can't do it, you have to modify it:

program convert

    use iso_fortran_env, only: real64
    implicit none

    integer, parameter :: reclength = 6*8 ! Six 8-byte values

    integer :: ii, ios
    integer :: u_in, u_out
    real(kind=real64) :: pos(3), B(3)

    open(newunit=u_in, file='data.txt', form='formatted', &
        status='old', action='read', access='sequential')
    open(newunit=u_out, file='data.bin', form='unformatted', &
        status='new', action='write', access='direct', recl=reclength)
    ii = 0

    do
        ii = ii + 1
        read(u_in, *, iostat=ios) pos, B
        if (ios /= 0) exit
        write(u_out, rec=ii) pos, B
    end do

    close(u_out)
    close(u_in)

end program convert

Once you have converted the data, you can read only the record you need, as long as you can calculate which one it is. I have assumed that just like in your example, the z-coordinate changes fastest and the x-coordinate changes slowest.

program read_txt
    use iso_fortran_env, only: real64
    implicit none

    integer, parameter :: nx=601, ny=181, nz=61
    real(kind=real64), parameter :: x_min=real(-nx/2, kind=real64)
    real(kind=real64), parameter :: y_min=real(-ny/2, kind=real64)
    real(kind=real64), parameter :: z_min=real(-nz/2, kind=real64)
    real(kind=real64), parameter :: x_step = 1.0_real64
    real(kind=real64), parameter :: y_step = 1.0_real64
    real(kind=real64), parameter :: z_step = 1.0_real64

    real(kind=real64) :: request(3), pos(3), B(3)
    integer :: ios, u_in
    integer :: ii, jj, kk, record
    integer, parameter :: reclength = 6 * 8 ! Six 8-byte values

    open(newunit=u_in, file='data.bin', access='direct', form='unformatted', &
        status='old', action='read', recl=reclength)
    mainloop : do
        read(*, *, iostat=ios) request
        if (ios /= 0) exit mainloop
        write(*, '(A, 3F7.2)') 'searching for ', request
        ! Calculate record
        ii = nint((request(1)-x_min)/x_step)
        jj = nint((request(2)-y_min)/y_step)
        kk = nint((request(3)-z_min)/z_step)
        record = kk + jj * nz + ii * nz * ny + 1
        read(u_in, rec=record, iostat=ios) pos, B
        if (ios /= 0) then
            print *, 'failure to read'
            cycle mainloop
        end if
        write(*, '(2(A, 3F7.2))') "found pos: ", pos, " Bx, By, Bz: ", B
    end do mainloop
    close(u_in)
end program read_txt

Note that the unformatted is not compiler- and system independent. A file created on one computer or with a program compiled by one compiler might not be able to be read with another program or on another computer.

But if you have control over it, it might be a useful way to speed things up.

PS: I left the x, y, and z coordinates in the file so that you can check whether the values are actually what you wanted. Always good to verify these things.

Upvotes: 3

Related Questions