Georgina Davenport
Georgina Davenport

Reputation: 203

How can Fortran find a string in an unformatted binary file?

I am writing a Fortran90 code to read .wav audio files.

Within the .wav format there is a chunk introduced by a string 'WAVE'. Within this chunk must appear two subchunks introduced by the strings 'fmt ' and 'data'.

In the particular .wav file I am using, to test the code, after the 'WAVE' string there is a gap of 36 characters beginning with the word 'JUNK' before the subchunk beginning with 'fmt ' appears in the file (picture suppled below).

The online resources I have read do not indicate such gaps are to be expected. The expectation is 'fmt ' should appear directly after 'WAVE'.

.wav file format description

I don't want my code to collapse when it encounters untypical formatting.

There appears to be no way to predetermine where the 'fmt ' string appears in the file. My strategy is to search the file for it and then simply discard the rogue section beginning with 'JUNK'.

My initial attempts to search the file stream using SCAN or INDEX have failed because passing these intrinsic functions the open file unit number throws an error which reports the file is not a string.

It may aid clarity to read my code as it is so far.

program main
  
  use iso_fortran_env

  !=========================================================================

  !Variables for .wav header.
  character(4)     :: ChunkID = '____'
  integer  (4)     :: FileSize
  character(4)     :: Wave = 'WAVE'

  !fmt need only be charcter(4) but is extended here for illustation output.
  character(40)    :: fmt = 'fmt '
  
  !=========================================================================

  !Working variables for file handling..
  integer  (1)  :: args
  character(30) :: file
  integer :: stat

  !Exit when no file name is supplied. 
  args = command_argument_count()
  if(args.ne.1)then
     print *
     print *, 'Error. Enter .wav file name'
     print *, 'Example: cat'
     print *, "NB. The '.wav' extension is assumed. You don't need to add it."
     stop
  end if
  call GET_COMMAND_ARGUMENT(1,file)

  !Construct .wav file name.
  file =  trim(file) // '.wav'

  !Try opening .wav file with name supplied
  OPEN(UNIT=1, iostat=stat, FILE=file, &
       form='unformatted', access='stream', status='old')

  !Test file status and exit on error.
  if(stat.ne.0) then
     write(*,'(a)') 'No known file named ', file
     stop
  end if
  print *, 'File existence test: Passed'

  ! Header read.
  read(1) ChunkID, FileSize, Wave, fmt
  print *, 'ChunkID: ', ChunkID
  print *, 'FileSize: ', FileSize
  print *, '"WAVE": ', wave
  print *, '"fmt ":', fmt

END PROGRAM MAIN

The output the program produces using my downloaded trial .wav file is this:

enter image description here

The trouble starts with the unwanted text following "fmt ": ahead of fmt at the end.

My purpose is to discard this redundant string then continue reading the file from the expected string 'fmt '.

What intrinsics should I use in fortran to enable to me to absorb and discard useless file contents, retaining the parts that I need, within a file containing an assortment of data types?

Upvotes: 1

Views: 369

Answers (1)

I use this subroutine to change the file position behind the searched string str:

subroutine skip_to(str, stat)
  character(*), intent(in) :: str
  integer, intent(out) :: stat
  character :: ch
  integer :: io

  do
    read(unit, iostat=io) ch

    if (io/=0) then
      stat = 1
      return
    end if

    if (ch==str(1:1)) then
      call check(str(2:), stat)
      if (stat == 0) return
    end if

  end do
end subroutine

subroutine check(str, stat)
  character(*), intent(in) :: str
  integer, intent(out) :: stat
  character :: ch
  integer :: i, io

  stat = 1
  i = 0

  do
    i = i + 1

    read(unit, iostat=io) ch

    if (io/=0) return

    if (ch/=str(i:i)) return

    if (i==len(str)) then
      stat = 0
      return
    end if
  end do
end subroutine

It might be very inefficient because it reads one byte a time for maximum simplicity. It just reads a byte and checks whether the string might be starting there and then it checks whether the next byte is the right one and so on.


Note that I often have to search for a string in the middle of a very large vtk file (gigabytes).

If you actually have just a small header. I would read the whole header into a long string and process it in memory using string-oriented routines.

Upvotes: 1

Related Questions