Alex Reynolds
Alex Reynolds

Reputation: 96927

Reducing number of refills in fgets()

My C++ program reads in a textual file stream that is delimited by newline characters. For performance reasons, I am using C I/O functions to process these data. I am using fgets() to read a line of this textual file stream into a char * buffer; the buffer gets processed with other functions not relevant to this question. Lines are read in until EOF.

Behind the scenes in fgets() — looking at a source code implementation for OpenBSD, for instance — it looks like this function will refill the FILE pointer's internal buffer once it runs out of characters to parse for newlines (assuming there are more characters to look through and ignoring other termination conditions, for a moment).

Problem: From profiling with gprof, it looks like a lot of time is spent on reading in and processing input, not so much elsewhere in the program, which is generally efficient. To improve performance, I'd like to explore reducing the total I/O overhead of this program, where I am working with very large (multi-GB) inputs.

Question: Perhaps minimizing refills is one way to keep file I/O to a minimum. Is there a (platform-independent) way to adjust the size of the internal buffer that the FILE pointer uses, or would I need to write a custom fgets()-like function with my own buffer? Are there other strategies for reducing overall I/O overhead (seeks, reads, etc.) when parsing text files?


Note: I apologize but I failed to indicate what kind of streams I am working with — I should state more clearly that my application reads from stdin (standard input) as well as regular files.

Upvotes: 1

Views: 80

Answers (1)

Jonathon Reinhart
Jonathon Reinhart

Reputation: 137398

The setbuf(3) family of functions allow you to specify the buffering for a FILE*.

Specifically, setbuffer and setvbuf allow you to assign a buffer you allocate to be associated with the file. Or, you can simply specify the size to be malloced.

See also the GNU libc documentation on Controlling Buffering.

Upvotes: 4

Related Questions