Chuck Remes
Chuck Remes

Reputation: 98

Use-cases for unbuffered reads

I'm working on a new set of IO classes for Ruby (to be used as replacements for the built-in core IO classes).

While doing this clean sheet redesign I've encountered an issue regarding buffered vs unbuffered read IO. Buffered IO has the advantage of reducing the number of syscalls and is particularly useful when trying to read and parse UTF-8 where a char may be anywhere from 1 to 4 bytes in length. I can read ahead 4k, parse my encoded string, and unused bytes can be put back by modifying the file pointer via seek.

However, this choice does not exist for reading from a stream like a pipe or socket because seeking is not allowed. I can't put those bytes back once I've read them.

So now I'm forced to read 1 (to 4) bytes at a time to complete a byte string that can be force encoded to UTF-8. If I try to do larger reads, I improve my string encoding performance but now am left with unused bytes. Any unused bytes (i.e. I read too far) need to be "put back" or "unget" by storing them in my IO object, however now I'm back to requiring buffered IO.

So, I'm wondering if I am making this decision more complex than it needs to be. If there are no good use-cases for unbuffered IO, then my path is clear. If there are good reasons for unbuffered IO, my path is also clear.

Does anyone have one or more real world use-cases for unbuffered reads?

Upvotes: 1

Views: 89

Answers (1)

Chuck Remes
Chuck Remes

Reputation: 98

After much rumination, I have decided that there are no compelling use-cases requiring unbuffered reads. So, my IO classes will default to buffering reads and trying to satisfy read requests from that cache. This will be true for both Block as well as Stream objects.

However, an interesting detail arose. The read cache will likely be 32k or so. Sometimes a user will try to read in excess of that size. In that event, the cache detects the large read, invalidates itself, and then passes through the read request to a __read__ "primitive" which does unbuffered reads. By electing to make that semi-private read available as part of the API, users can bypass the read cache altogether and do a direct unbuffered read.

So I'll offer the best of both worlds without cluttering up the API.

Upvotes: 1

Related Questions