Reputation: 33
I'm reading LDD3 and messing up with the kernel source code. Currently, I'm trying to fully understand the struct bio
and its usage.
What I have read so far:
https://lwn.net/images/pdf/LDD3/ch16.pdf
http://www.makelinux.net/books/lkd2/ch13lev1sec3
https://lwn.net/Articles/26404/
(a part of) https://www.kernel.org/doc/Documentation/block/biodoc.txt
If I understand correctly, a struct bio
describes a request for some blocks to be transferred between a block device and system memory. The rules are that a single struct bio can only refer to a contiguous set of disk sectors but system memory can be non-contiguous and be represented by a vector of <page,len,offset>
, right?. That is, a single struct bio requests the reading/writing of bio_sectors(bio)
(multitude) sectors, starting with sector bio->bi_sector
. The size of data transferred is limited by the actual device, the device driver, and/or the host adapter. I can get that limit by queue_max_hw_sectors(request_queue)
, right? So, if I keep submitting bio
s that turn out to be contiguous in disk sectors, the I/O scheduler/elevator will merge these bio
s into a sigle one, until that limit is reached, right?
Also, bio->size
must be a multiple of 512 (or the equivalent sector size) so that bio_sectors(bio)
is a whole number, right?
Moreover, these bio_sectors(bio)
sectors will be moved to/from system memory, and by memory we mean struct page
s. Since there is no specific mapping between <page,len,offset>
and disk sectors, I assume that implicitly bio->bi_io_vec
are serviced in order or appearence. That is, the first disk sectors (starting at bio->bi_sector
) will be written from / read to bio->bi_io_vec[0].bv_page
then bio->bi_io_vec[1].pv_page
etc. Is that right? If so, should bio_vec->bv_len
be always a multiple of sector_size or 512? Since a page is usually 4096bytes, should bv_offset
be exactly one of {0,512,1024,1536,...,3584,4096}
? I mean, does it make sense for example to request 100bytes to be written on a page starting at offset 200?
Also, what is the meaning of bio.bio_phys_segments
and why does it differ from bio.bi_vcnt
? bio_phys_segments
is defined as "The number of physical segments contained within this BIO". Isn't a triple <page,len,offset>
what we call a 'physical segment'?
Lastly, if a struct bio
is so complex and powerfull, why do we create lists of struct bio
and name them struct request
and queue them requests in the request_queue
? Why not have a bio_queue
for the block device where each struct bio
is stored until it is serviced?
I'm a bit confused so any answers or pointers to Documentation will be more than useful! Thank you in advance :)
Upvotes: 3
Views: 2837
Reputation: 705
what is the meaning of bio.bio_phys_segments?
The generic block layer can merge different segments. When the page frames in memory and the chunks of disk data, that are adjacent on the disk, are contiguous then the resultant merge operation creates a larger memory area which is called physical segment.
Then what is bi_hw_segments?
Yet another merge operation is allowed on architectures that handle the mapping between bus addresses and physical addresses through a dedicated bus circuitry. The memory area resulting from this kind of merge operation is called hardware segment. On the 80 x 86 architecture, which has no such dynamic mapping between bus addresses and physical addresses,hardware segments always coincide with physical segments.
That is, the first disk sectors (starting at bio->bi_sector) will be written from / read to bio->bi_io_vec[0].bv_page then bio->bi_io_vec[1].pv_page etc.
Is that right? If so, should bio_vec->bv_len be always a multiple of sector_size or 512? Since a page is usually 4096bytes, should bv_offset be exactly one of {0,512,1024,1536,...,3584,4096}? I mean, does it make sense for example to request 100bytes to be written on a page starting at offset 200?
The bi_io_vec contains the page frame for the IO. bv_offset is the offset in the page frame. Before actual writing/reading on the disk every thing is mapped to sector as disk deals in sectors. This doesn't imply that length has to be in the multiple of sectors. So this will result into unaligned read/writes which is taken care by underlying device driver.
if a struct bio is so complex and powerfull, why do we create lists of struct bio and name them struct request and queue them requests in the request_queue? Why not have a bio_queue for the block device where each struct bio is stored until it is serviced?
Request queue is per device structure and takes care of flushing. Every block device has its own request queue. And bio structure is generic entity for IO. If you incorporate request_queue featues into bio then you will create a single global bio_queue and that too very heavy structure. Not a good idea. So basically these two structures serve different purposes in context of IO operation.
Hope it helps.
Upvotes: 4