Reputation: 96927
My C++ program reads in a textual file stream that is delimited by newline characters. For performance reasons, I am using C I/O functions to process these data. I am using fgets()
to read a line of this textual file stream into a char *
buffer; the buffer gets processed with other functions not relevant to this question. Lines are read in until EOF.
Behind the scenes in fgets()
— looking at a source code implementation for OpenBSD, for instance — it looks like this function will refill the FILE
pointer's internal buffer once it runs out of characters to parse for newlines (assuming there are more characters to look through and ignoring other termination conditions, for a moment).
Problem: From profiling with gprof
, it looks like a lot of time is spent on reading in and processing input, not so much elsewhere in the program, which is generally efficient. To improve performance, I'd like to explore reducing the total I/O overhead of this program, where I am working with very large (multi-GB) inputs.
Question: Perhaps minimizing refills is one way to keep file I/O to a minimum. Is there a (platform-independent) way to adjust the size of the internal buffer that the FILE
pointer uses, or would I need to write a custom fgets()
-like function with my own buffer? Are there other strategies for reducing overall I/O overhead (seeks, reads, etc.) when parsing text files?
Note: I apologize but I failed to indicate what kind of streams I am working with — I should state more clearly that my application reads from stdin
(standard input) as well as regular files.
Upvotes: 1
Views: 80
Reputation: 137398
The setbuf(3)
family of functions allow you to specify the buffering for a FILE*
.
Specifically, setbuffer
and setvbuf
allow you to assign a buffer you allocate to be associated with the file. Or, you can simply specify the size to be malloc
ed.
See also the GNU libc documentation on Controlling Buffering.
Upvotes: 4