How is data copied from user space to kernel space and vice versa during I/O tasks?

Question

I'm learning an operating system course and on slide 32: https://people.eecs.berkeley.edu/~kubitron/courses/cs162-S19/sp19/static/lectures/3.pdf

the professor briefly said fread and fwrite implement user space buffer, thus more efficient than directly calling system functions read/write and can save disk access, but didn't explain why.

Imagine these two scenarios: we need to read/write 16 bytes, user buffer is 4-byte, scenario one is using fread/fwrite, scenario two is directing using read/write which process one byte at each time

My questions are:

Since fread calls read underneath, how many read function calls will be invoked respectively?
Is data transfer, whether one single byte or 1mb between user space buffer and kernel space buffer all done by the kernel and no user/kernel mode switch involved during transferring?
How many disk accesses are performed respectively? Won't the kernel buffer come into play during scenario two?
The read function ssize_t read(int fd, void *buf, size_t count) also has buffer and count parameters, can these replace the role of user space buffer?

Brendan · Accepted Answer

Since fread calls read underneath, how many read function calls will be invoked respectively?

Because fread() is mostly just slapping a buffer (in user-space, likely in a shared library) in front of read(), the "best case number of read() system calls" will depend on the size of the buffer.

For example; with an 8 KiB buffer; if you read 6 bytes with a single fread(), or if you read 6 individual bytes with 6 fread() calls; then read() will probably be called once (to get up to 8 KiB of data into the buffer).

However; read() may return less data than was requested (and this is very common for some cases - e.g. stdin if the user doesn't type fast enough). This means that fread() might use read() to try to fill it's buffer, but read() might only read a few bytes; so fread() needs to call read() again later when it needs more data in its buffer. For a worst case (where read() only happens to return 1 byte each time) reading 6 bytes with a single fread() may cause read() to be called 6 times.

Is data transfer, whether one single byte or 1mb between user space buffer and kernel space buffer all done by the kernel and no user/kernel mode switch involved during transferring?

Often, read() (in the C standard library) calls some kind of "sys_read()" function provided by the kernel. In this case there's a switch to kernel when "sys_read()" is called, then the kernel does whatever it needs to to obtain and transfer the data, then there's one switch back from kernel to user-space.

However; nothing says that's how a kernel must work. E.g. a kernel could only provide a "sys_mmap()" (and not provide any "sys_read()") and the read() (in the C standard library) could use "sys_mmap()". For another example; with an exo-kernel, file systems might be implemented as shared libraries (with "file system cache" in shared memory) so a read() done by the C library (of a file's data that is in the "file system cache") may not involve the kernel at all.

How many disk accesses are performed respectively? Won't the kernel buffer come into play during scenario two?

There's too many possibilities. E.g.:

a) If you're reading from a pipe (where the data is in a buffer in the kernel and was previously written by a different process) then there will be no disk accesses (because the data was never on any disk to begin with).

b) If you're reading from a file and the OS cached the file's data already; then there may be no disk accesses.

c) If you're reading from a file and the OS cached the file's data already; but the file system needs to update meta-data (e.g. an "accessed time" field in the file's directory entry) then there may be multiple disk accesses that have nothing to do with the file's data.

d) If you're reading from a file and the OS hasn't cached the file's data; then at least one disk access will be necessary. It doesn't matter if it's caused by fread() attempting to read a whole buffer, read() trying to read all 6 bytes at once, or the OS fetching a whole disk block because of the first "read() of one byte" in a series of six separate "read() of one byte" requests. If the OS does no caching at all, then six separate "read() of one byte" requests will be at least 6 separate disk accesses.

e) file system code may need to access some parts of the disk to determine where the file's data actually is before it can read the file's data; and the requested file data may be split between multiple blocks/sectors on the disk; so reading 2 or more bytes from a file (regardless of whether it was caused by fread() or "read() of 2 or more bytes") could cause several disk accesses.

f) with a RAID 5/6 array involving 2 or more physical disks (where reading a "logical block" involves reading the block from one disk and also reading the parity info from a different disk), the number of disk accesses can be doubled.

The read function ssize_t read(int fd, void *buf, size_t count) also has buffer and count parameters, can these replace the role of user space buffer?

Yes; but if you're using it to replace the role of a user space buffer then you're mostly just implementing your own duplicate of fread().

It's more common to use fread() when you want treat the data as stream of bytes, and read() (or maybe mmap()) when you do not want to treat the data as a stream of bytes.

For a random example; maybe you're working with a BMP file; so you read the "guaranteed to be 14 bytes by the file format's spec" header; then check/decode/process the header; then (after determining where it is in the file, how big it is and what format it's in) you might seek() to the pixel data and read all of it into an array (then maybe spawn 8 threads to process the pixel data in the array).

How is data copied from user space to kernel space and vice versa during I/O tasks?

Answers (1)

Related Questions