Jason
Jason

Reputation: 11

Why do file IO speeds change when reading file size?

I'm creating a memory analysis program in C++ on Windows 10 using a 7200rpm HDD that essentially scans your drive and reports back on which folders are using how much space, allowing you to figure out where most of your drive storage is being used.

For efficiency reasons, I'm using C++ and my methodology is scanning the entire drive recursively, then reading each file's size in another thread so that I can both scan and analyze size at the same time. For obvious reasons, scanning is much faster than reporting on size but I've noticed that the IO speeds jump around a lot. Sometimes it'll read the size of 5000 files/second, whereas other times it'll read 10 files/second. Take a look at the video at this link. The first number is how many files' sizes have been read and the second number is how many files have been found altogether. The first number is what's important here.

Why does my file IO speed change, and is there anything I can do about this?

Upvotes: 0

Views: 578

Answers (1)

Thomas Matthews
Thomas Matthews

Reputation: 57749

You have many bottlenecks to consider, both on the processor side and on the hard drive side.

Locating The Data

Essentially, the hard drive has to locate the sectors and tracks that contain the data in the file. If you are really lucky, the data will be in sequential sectors on sequential tracks, thus causing very little head movement or repositioning. However, file data can be "scattered", and thus the hard drive will read as much as it can, then calculate the next position of the data, relocate the head to that position and keep reading. This affects the flow of the data. If you drive is intelligent and has a lot of cache, the drive could place this data into a cache and deliver data from the cache instead of the drive, possibly making up some lost nanoseconds due to repositioning.

The Data Bus

The data has to go into the PC's memory. Usually there is only one bus for the data. This bus is shared among many entities in your system, the processor and the hard-drive controller to name a few. If your lucky, your PC has a Direct Memory Access (DMA) controller for the hard drive. The controller can transfer data from the hard drive port into memory, bypassing the processor. However, the DMA controller must share the data bus with the processor (and friends). The bus arbitration is another slowdown and inconsistency.

Sharing the Drive

Many operating systems use the hard drive as virtual memory; swapping out blocks of memory. These file requests will need to be intermingled with the requests from your program.

Sequential Access

Most of the cheaper platforms have sequential access to the drive. Only one entity can read at the same time. Most drives are a single bit stream. Higher performance, custom platforms, actually have more than one drive running in parallel. Because of the sequential nature of the device, entities must either wait for another to finish or intermingle the transactions. Compared to memory that is parallel access (8 or more bits read at the same time).

Interruptions & Scheduling

There are lots of activities going on inside your PC, from internet or wifi communications to audio and video playbacks (as well as other system tasks running). These all need to run. No matter how many cores you have, there isn't enough. Most Operating Systems will run the tasks by time and priority. Very rarely will one task have exclusive ownership of a processor until the task finishes. Your task will be intermingled with other tasks that are running. Thus slowing down your program.

Chunking It

Most disk clean up utilities work in chunks or pieces of files. Speed is not as important as the quality of the data operation. For example, a smaller chunk of a file will have better success at being moved or copied than a huge chunk. The program can be interrupted (from a User, for example). Smaller chunks allow for easier recovery from an interruption.

There are probably more reasons why your program is executing slowly or has inconsistent timings, but the above information should give you better insight as to the behavior of your PC.

Upvotes: 1

Related Questions