Reputation: 155418
Consider SetFilePointer
. The documentation on MSDN (nor learn.microsoft.com
) does not explain if a forward seek constitutes sequential access or not - this has implications for applications' IO performance.
For example, if you use CreateFile
with FILE_FLAG_RANDOM_ACCESS
then Win32 will use a different buffering and caching strategy compared to FILE_FLAG_SEQUENTIAL_SCAN
- if you're reading a file from start to finish then you can expect better performance than with the random-access option.
However, supposing the file format you're reading does not necessitate that every byte (or even buffer-page) be read into memory, such as a flag in the file's header indicates that the first 100 bytes - or 100 kilobytes - contains no useful data. Is it wise to call ReadFile
to read the next 100 bytes (or 100 kilobytes - or more?) - or will it always be faster to call SetFilePointer( file, 100, NULL, FILE_CURRENT )
to skip-over those 100 bytes?
If it is generally faster to use SetFilePointer
, does the random-access vs sequential option make a difference? I would think that seeking forward constitutes a form of random-access because you could seek forward beyond the currently cached buffer (and any future buffers that the OS might have pre-loaded for you behind the scenes) but in that case will Windows always discard the cached buffers and re-read from disk? Is there a way to find out the maximum amount one can seek-forward without triggering a buffer reload?
(I would try to profile and benchmark to test my hypothesis, but all my computers at-hand have NVMe SSDs - obviously things will be very different on platter drives).
Upvotes: 2
Views: 314
Reputation: 33754
at first about SetFilePointer
.
SetFilePointer
internally called ZwSetInformationFile
with FilePositionInformation
. it full handled by I/O manager - the file system is not even called. all what is done on this call : CurrentByteOffset
from FILE_OBJECT
is set to given position.
so this call absolute independent from file buffering and caching strategy. more - this is absolute senseless call, which only waste time - we always can set direct offset in call to ReadFile
or WriteFile
- look in OVERLAPPED
Offset
and OffsetHigh
. SetEndOfFile
? but much more better and effective call ZwSetInformationFile
with FileEndOfFileInformation
or SetFileInformationByHandle
with FileEndOfFileInfo
(SetEndOfFile
of course internally call ZwSetInformationFile
with FileEndOfFileInformation
and before it call ZwQueryInformationFile
with FilePositionInformation
for read CurrentByteOffset
from FILE_OBJECT
- so you simply do 2-3 unnecessary extra calls to kernel in case SetEndOfFile
). not exist situation when call to SetFilePointer
really need.
so file position - is only software variable (CurrentByteOffset
in FILE_OBJECT
) which used primary by I/O manager -
filesystem always get read/write request with explicit offset - in FastIoRead
as in argument or in IO_STACK_LOCATION.Parameters.Read.ByteOffset
the I/O manager get this offset or from explicit ByteOffset value to NtReadFile
or from CurrentByteOffset
in FILE_OBJECT
if ByteOffset not present (NULL pointer for ByteOffset)
ReadFile
use NULL pointer for ByteOffset if NULL pointer for OVERLAPPED
, otherwise use pointer to OVERLAPPED.Offset
about question - are exist sense sequential read all bytes or just read from needed offset ?
in case we open file without caching ( FILE_NO_INTERMEDIATE_BUFFERING
) - we have no choice Offset and Length passed to ReadFile
or WriteFile
must be a multiple of the sector size
in case using cache - we anyway nothing gain if read some additional (and not needed to us bytes) before read actual needed bytes. in any case file system will be need read this bytes from disk, if it yet not read - reading another bytes does not accelerate this process.
with FILE_FLAG_SEQUENTIAL_SCAN
cache manager read more sectors from disk than need for complete current request and next reading at sequential offset - will (as minimum partially) fall cache - so count of direct read from disk (most expensive operation) will be less. but when you need read file at specific offset - sequential read bytes before this offset not help in any way - anyway will be need read this bytes
in other words - you anyway need read required bytes (at specific offset) from file - and if you before this read some another bytes - this not increase performance. only diminishes
so if you need read file at some offset - just read at this offset. and not use SetFilePointer
. use explicit offset on OVERLAPPED
Upvotes: 1