Reputation: 4718
I have 500,000 files which I need to read in Fortran and each file has ~14,000 entries in it (each entry is only about 100 characters long). I need to process each line for each file at a time. For example, I need to process line 1 for all 500,000 files before moving on to line 2 from the files and so forth.
I cannot open them all at once (I tried making an array of file pointers and opening them all) because there will be too many files open at once. Instead, I would like to do something as follows:
do iline = 1,Nlines
do ifile = 1,Nfiles
! open the file
! read a line
! close the file
enddo
end
In hopes that this would allow me to read one line at a time (from each file) and then move on to the next line (in each file). Unfortunately, each time I open the file it starts me off at line 1 again. Is there any way to open/close a file and then open it again where you left off previously?
Thanks
Upvotes: 1
Views: 717
Reputation: 6989
the overhead of all the file opening/closing will be a big performance bottleneck. You should try to read as much as you can for each open operation given whatever memory you have:
pseudocode:
loop until done:
loop over all files:
open
fseek !as in damiens answer
read N lines into array ! N=100 eg.
save ftell value for file
close
end file loop
loop over N output files:
open
write array data
close
Upvotes: 0
Reputation: 29381
If the files have fixed, i.e., constant, record lengths, you could use direct access. Then you could "directly" read a specific record. A big "if" however.
Upvotes: 0
Reputation: 59998
Unfortunately it is not possible in this way in standard Fortran. Even If you specify
position="ASIS"
the actual position will be unspecified for a not already connected unit and will be in fact the beginning of the file on most systems.
That means You have to use
read(*,*)
enough times to get on the right place in the file.
You could also use stream
access. The file would be again opened at the beginning, but you can use
read(u,*,pos=n) number
where n
is the position saved from the previous open. You can get the position from
inquire(unit=u, pos=n)
n = n
You would open the file with acess="STREAM"
.
Also 500000 opened files is indeed too much. There are ways how to inquire for the system limits and how to control them, but also your compiler may have some limits http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
Other solution: Couldn't you store the content of the files in memory? Today couple of Gigabytes is OK, but it may be not enough for you.
Upvotes: 3
Reputation: 59072
You can try using fseek and ftell in something like the following.
! initialize an array of 0's
do iline = 1,Nlines
do ifile = 1,Nfiles
! open the file
! fseek(fd, array(ifile))
! read a line
! array(ifile)=ftell(fd)
! close the file
enddo
end
The (untested) idea is to store the offset of each file in an array and position the cursor at that place upon opening the file. Then, once a line is read, the ftell
retrieves the current position which is saved to memory for next round. If all entries have the same length, you can spare the array and just store one value.
Upvotes: 0