Dai
Dai

Reputation: 155418

Are seeks cheaper than reads, and does forward-seeking fall foul of the sequential-access optimization?

Consider SetFilePointer. The documentation on MSDN (nor learn.microsoft.com) does not explain if a forward seek constitutes sequential access or not - this has implications for applications' IO performance.

For example, if you use CreateFile with FILE_FLAG_RANDOM_ACCESS then Win32 will use a different buffering and caching strategy compared to FILE_FLAG_SEQUENTIAL_SCAN - if you're reading a file from start to finish then you can expect better performance than with the random-access option.

However, supposing the file format you're reading does not necessitate that every byte (or even buffer-page) be read into memory, such as a flag in the file's header indicates that the first 100 bytes - or 100 kilobytes - contains no useful data. Is it wise to call ReadFile to read the next 100 bytes (or 100 kilobytes - or more?) - or will it always be faster to call SetFilePointer( file, 100, NULL, FILE_CURRENT ) to skip-over those 100 bytes?

If it is generally faster to use SetFilePointer, does the random-access vs sequential option make a difference? I would think that seeking forward constitutes a form of random-access because you could seek forward beyond the currently cached buffer (and any future buffers that the OS might have pre-loaded for you behind the scenes) but in that case will Windows always discard the cached buffers and re-read from disk? Is there a way to find out the maximum amount one can seek-forward without triggering a buffer reload?

(I would try to profile and benchmark to test my hypothesis, but all my computers at-hand have NVMe SSDs - obviously things will be very different on platter drives).

Upvotes: 2

Views: 314

Answers (1)

RbMm
RbMm

Reputation: 33754

at first about SetFilePointer.

SetFilePointer internally called ZwSetInformationFile with FilePositionInformation. it full handled by I/O manager - the file system is not even called. all what is done on this call : CurrentByteOffset from FILE_OBJECT is set to given position.

so this call absolute independent from file buffering and caching strategy. more - this is absolute senseless call, which only waste time - we always can set direct offset in call to ReadFile or WriteFile - look in OVERLAPPED Offset and OffsetHigh. SetEndOfFile ? but much more better and effective call ZwSetInformationFile with FileEndOfFileInformation or SetFileInformationByHandle with FileEndOfFileInfo (SetEndOfFile of course internally call ZwSetInformationFile with FileEndOfFileInformation and before it call ZwQueryInformationFile with FilePositionInformation for read CurrentByteOffset from FILE_OBJECT - so you simply do 2-3 unnecessary extra calls to kernel in case SetEndOfFile). not exist situation when call to SetFilePointer really need.

so file position - is only software variable (CurrentByteOffset in FILE_OBJECT) which used primary by I/O manager - filesystem always get read/write request with explicit offset - in FastIoRead as in argument or in IO_STACK_LOCATION.Parameters.Read.ByteOffset the I/O manager get this offset or from explicit ByteOffset value to NtReadFile or from CurrentByteOffset in FILE_OBJECT if ByteOffset not present (NULL pointer for ByteOffset) ReadFile use NULL pointer for ByteOffset if NULL pointer for OVERLAPPED, otherwise use pointer to OVERLAPPED.Offset

about question - are exist sense sequential read all bytes or just read from needed offset ?

in case we open file without caching ( FILE_NO_INTERMEDIATE_BUFFERING) - we have no choice Offset and Length passed to ReadFile or WriteFile must be a multiple of the sector size

in case using cache - we anyway nothing gain if read some additional (and not needed to us bytes) before read actual needed bytes. in any case file system will be need read this bytes from disk, if it yet not read - reading another bytes does not accelerate this process.

with FILE_FLAG_SEQUENTIAL_SCAN cache manager read more sectors from disk than need for complete current request and next reading at sequential offset - will (as minimum partially) fall cache - so count of direct read from disk (most expensive operation) will be less. but when you need read file at specific offset - sequential read bytes before this offset not help in any way - anyway will be need read this bytes

in other words - you anyway need read required bytes (at specific offset) from file - and if you before this read some another bytes - this not increase performance. only diminishes

so if you need read file at some offset - just read at this offset. and not use SetFilePointer. use explicit offset on OVERLAPPED

Upvotes: 1

Related Questions