user1220978
user1220978

Reputation:

Reading a file of lists of integers in Fortran

I would like to read a data file with a Fortran program, where each line is a list of integers.

Each line has a variable number of integers, separated by a given character (space, comma...).

Sample input:

1,7,3,2
2,8
12,44,13,11

I have a solution to split lines, which I find rather convoluted:

module split
    implicit none
contains
    function string_to_integers(str, sep) result(a)
        integer, allocatable :: a(:)
        integer :: i, j, k, n, m, p, r
        character(*) :: str
        character :: sep, c
        character(:), allocatable :: tmp

        !First pass: find number of items (m), and maximum length of an item (r)
        n = len_trim(str)
        m = 1
        j = 0
        r = 0
        do i = 1, n
            if(str(i:i) == sep) then
                m = m + 1
                r = max(r, j)
                j = 0
            else
                j = j + 1
            end if
        end do
        r = max(r, j)

        allocate(a(m))
        allocate(character(r) :: tmp)

        !Second pass: copy each item into temporary string (tmp),
        !read an integer from tmp, and write this integer in the output array (a)
        tmp(1:r) = " "
        j = 0
        k = 0
        do i = 1, n
            c = str(i:i)
            if(c == sep) then
                k = k + 1
                read(tmp, *) p
                a(k) = p
                tmp(1:r) = " "
                j = 0
            else
                j = j + 1
                tmp(j:j) = c
            end if
        end do
        k = k + 1
        read(tmp, *) p
        a(k) = p
        deallocate(tmp)
    end function
end module

My question:

Here is the current program:

program read_data
    use split
    implicit none
    integer :: q
    integer, allocatable :: a(:)
    character(80) :: line
    open(unit=10, file="input.txt", action="read", status="old", form="formatted")
    do
        read(10, "(A80)", iostat=q) line
        if(q /= 0) exit
        if(line(1:1) /= "#") then
            a = string_to_integers(line, ",")
            print *, ubound(a), a
        end if
    end do
    close(10)
end program

A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.

Upvotes: 2

Views: 2836

Answers (2)

I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.

If you always use the comma you can remove the replacing by comma and thus shorten it even more.

function string_to_integers(str, sep) result(a)
    integer, allocatable :: a(:)
    character(*) :: str
    character :: sep
    integer :: i, n_sep

    n_sep = 0
    do i = 1, len_trim(str)
      if (str(i:i)==sep) then
        n_sep = n_sep + 1
        str(i:i) = ','
       end if
    end do
    allocate(a(n_sep+1))
    read(str,*) a
end function

Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.

The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not

    do i = 2, len_trim(str)
      if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
        n_sep = n_sep + 1
        str(i:i) = ','
       end if
    end do

Upvotes: 4

Mark S
Mark S

Reputation: 306

My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.

read(unit=10,fmt=*) a

will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).

    status = 0
    do while ( status == 0)
      line_counter = line_counter + 1
      read(unit=10,, iostat=status, fmt=*)
    end do

allocate(a(counter*21))

If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.

Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by

do j = 1, line_counter         ! You determined this on your first read
  read(unit=11,fmt=*) a(j,:)   ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
                               ! You have a separate vector numberOfCommas(j) from before
end do

And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.

Upvotes: 0

Related Questions