Reputation: 207
I have a problem where I want to pull discrete chunks of data from disk into a queue, and dequeue them into another process. This data is randomly located on disk, so would not benefit substantially from sequential reads. It's alot of data so I can't load it all at once, nor is it efficient to pull in a block at a time.
I'd like the consumer to be able to operate at its own speed, but to keep a healthy queue of data ready for it so that I'm not constantly waiting on disk reads as I process chunks.
Is there an established way to do this? I.e with the jobs framework or safetyvalve? Implementing this feels like reinventing the wheel as a slow consumer operating on disk data is a common problem.
Any suggestions as to how best to tackle this the Erlang way?
Upvotes: 3
Views: 198
Reputation: 9648
You can use the {read_ahead, Bytes}
option on file:open/2
:
{read_ahead, Size}
This option activates read data buffering. If
read/2
calls are for significantly less thanSize
bytes, read operations towards the operating system are still performed for blocks ofSize
bytes. The extra data is buffered and returned in subsequentread/2
calls, giving a performance gain since the number of operating system calls is reduced.The
read_ahead
buffer is also highly utilized by theread_line/1
function inraw
mode, why this option is recommended (for performance reasons) when accessing raw files using that function.If
read/2
calls are for sizes not significantly less than, or even greater thanSize
bytes, no performance gain can be expected.
You've been vague on the sizes you mentioned using, but it seems that toying with that buffer size should be a decent start implementing what you need.
Upvotes: 1