Reputation: 81
I'm working on a benchmark program. Upon making the read()
system call, the program appears to hang indefinitely. The target file is 1 GB of binary data and I'm attempting to read directly into buffers that can be 1, 10 or 100 MB in size.
I'm using std::vector<char>
to implement dynamically-sized buffers and handing off &vec[0]
to read()
. I'm also calling open()
with the O_DIRECT
flag to bypass kernel caching.
The essential coding details are captured below:
std::string fpath{"/path/to/file"};
size_t tries{};
int fd{};
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
// Throw exception if error opening file
if (fd == -1) {
ostringstream ss {};
switch (errno) {
case EACCES:
ss << "Error accessing file " << fpath << ": Permission denied";
break;
case EINVAL:
ss << "Invalid file open flags; system may also not support O_DIRECT flag, required for this benchmark";
break;
case ENAMETOOLONG:
ss << "Invalid path name: Too long";
break;
case ENOMEM:
ss << "Kernel error: Out of memory";
}
throw invalid_argument {ss.str()};
}
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
Poking through the executable with gdb shows that buffers are allocated correctly, and the file I've tested with checks out in xxd. I'm using g++ 7.3.1 (with C++11 support) to compile my code on a Fedora Server 27 VM.
Why is read()
hanging on large binary files?
Edit: Code example updated to more accurately reflect error checking.
Upvotes: 0
Views: 620
Reputation: 136485
Most examples of
read()
hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
O_DIRECT
flag is useful for filesystems and block devices. With this flag people normally map pages into the user space.
For sockets, pipes and serial devices it is plain useless because the kernel does not cache that data.
Your updated code hangs because fd
is initialized with 0
which is STDIN_FILENO
and it never opens that file, then it hangs reading from stdin
.
Upvotes: 0
Reputation: 1
There are multiple problems with your code.
This code will never work properly if errno
ever has a value equal to EINTR
:
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
That code won't stop when the file has been successfully opened and will keep reopening the file over and over and leak file descriptors as it keeps looping once errno
is EINTR
.
This would be better:
do
{
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
while ( ( -1 == fd ) && ( EINTR == errno ) && ( tries < MAX_ATTEMPTS ) );
Second, as noted in the comments, O_DIRECT
can impose alignment restrictions on memory. You might need page-aligned memory:
So
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
becomes
size_t buf_sz{1024*1024}; // 1 MiB buffer
// page-aligned buffer
buffer = mmap( 0, buf_sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, NULL );
auto bytes_read = read(fd, &buffer[0], buf_sz);
Note also the the Linux implementation of O_DIRECT
can be very dodgy. It's been getting better, but there are still potential pitfalls that aren't very well documented at all. Along with alignment restrictions, if the last amount of data in the file isn't a full page, for example, you may not be able to read it if the filesystem's implementation of direct IO doesn't allow you to read anything but full pages (or some other block size). Likewise for write()
calls - you may not be able to write just any number of bytes, you might be constrained to something like a 4k page.
This is also critical:
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
Some devices simply do not support direct IO. They should return an error, but again, the O_DIRECT implementation on Linux can be very hit-or-miss.
Upvotes: 3
Reputation: 13085
Pasting your program and running on my linux system, was a working and non-hanging program.
The most likely cause for the failure is the file is not a file-system item, or it has a hardware element which is not working.
Try with a smaller size - to confirm, and try on a different machine to help diagnose
My complete code (with no error checking)
#include <vector>
#include <string>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main( int argc, char ** argv )
{
std::string fpath{"myfile.txt" };
auto fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
}
myfile.txt was created with
dd if=/dev/zero of=myfile.txt bs=1024 count=1024
Upvotes: 0