drjrm3
drjrm3

Reputation: 4718

Reading MANY files at once in Fortran

I have 500,000 files which I need to read in Fortran and each file has ~14,000 entries in it (each entry is only about 100 characters long). I need to process each line for each file at a time. For example, I need to process line 1 for all 500,000 files before moving on to line 2 from the files and so forth.

I cannot open them all at once (I tried making an array of file pointers and opening them all) because there will be too many files open at once. Instead, I would like to do something as follows:

 do iline = 1,Nlines
   do ifile = 1,Nfiles
     ! open the file
     ! read a line
     ! close the file
   enddo
 end

In hopes that this would allow me to read one line at a time (from each file) and then move on to the next line (in each file). Unfortunately, each time I open the file it starts me off at line 1 again. Is there any way to open/close a file and then open it again where you left off previously?

Thanks

Upvotes: 1

Views: 717

Answers (4)

agentp
agentp

Reputation: 6989

the overhead of all the file opening/closing will be a big performance bottleneck. You should try to read as much as you can for each open operation given whatever memory you have:

pseudocode:

          loop until done:
            loop over all files:
                open
                fseek !as in damiens answer
                read N lines into array ! N=100 eg.
                save ftell value for file
                close
            end file loop
            loop over N output files:
                open
                write array data
                close

Upvotes: 0

M. S. B.
M. S. B.

Reputation: 29381

If the files have fixed, i.e., constant, record lengths, you could use direct access. Then you could "directly" read a specific record. A big "if" however.

Upvotes: 0

Unfortunately it is not possible in this way in standard Fortran. Even If you specify

position="ASIS"

the actual position will be unspecified for a not already connected unit and will be in fact the beginning of the file on most systems.

That means You have to use

  read(*,*)

enough times to get on the right place in the file.

You could also use stream access. The file would be again opened at the beginning, but you can use

  read(u,*,pos=n) number

where n is the position saved from the previous open. You can get the position from

inquire(unit=u, pos=n)
n = n

You would open the file with acess="STREAM".

Also 500000 opened files is indeed too much. There are ways how to inquire for the system limits and how to control them, but also your compiler may have some limits http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

Other solution: Couldn't you store the content of the files in memory? Today couple of Gigabytes is OK, but it may be not enough for you.

Upvotes: 3

damienfrancois
damienfrancois

Reputation: 59072

You can try using fseek and ftell in something like the following.

! initialize an array of 0's
do iline = 1,Nlines
   do ifile = 1,Nfiles
     ! open the file
     ! fseek(fd, array(ifile))
     ! read a line
     ! array(ifile)=ftell(fd)
     ! close the file
   enddo
 end

The (untested) idea is to store the offset of each file in an array and position the cursor at that place upon opening the file. Then, once a line is read, the ftell retrieves the current position which is saved to memory for next round. If all entries have the same length, you can spare the array and just store one value.

Upvotes: 0

Related Questions