No Idea For Name
No Idea For Name

Reputation: 11577

file copy is faster after one pass

I have a program that copies a large amount of files from several directories to on directory. the amount is not known(about 50K), but they are all on the same drive. in the program there should be a progress bar. when i wrote the program for the first time i did not wrote the progress bar and program ran slow. i toke about 15-20 min to pass the files. in order to write the progress bar i needed to know how many files do i have, so i went through the directories and listed the files. now the first ran through the files takes about 5 min, but the copy takes only 5-7 min.

can anyone explain the phenomenon? I'm sorry that i can't share the code, but it's a simple use of File.Copy and a simple c# .net 3.5 progressBar

Upvotes: 0

Views: 167

Answers (2)

Hans Passant
Hans Passant

Reputation: 941317

This approach minimizes the most expensive operation on a disk drive, moving the reader head. Disk speeds are rated by two basic mechanical constraints. One is how fast the platters spin which sets an upper bound on the data transfer speed. That's fixed. And how fast the read head can be moved to another track. The seek time, a fat dozen milliseconds to move it by one track is typical. A very long time in cpu cycles. Which makes the order in which you access disk data very important. Constantly jumping the reader head back-and-forth between the directory records and the file data clusters like you did originally is very expensive.

To what degree the data on the disk is fragmented is also very important, the reason defrag utilities exist. A drive that sees a high rate of files getting created and deleted tends to get fragmented quicker. The higher the fragmentation rate, the more disk seeks you'll need to read data from the drive.

By reading the directory entries first you can avoid a lot of seeks. They are localized in an area of the drive called the MFT, physically close to each other so far fewer long seeks. You'll read them again when you actually start copying the files, but this time they come from the file system cache. Stored in RAM when you scanned the directories the first time. So no need for an expensive seek back to the MFT.

Also notably the reason why SSDs work so much better than mechanical drives, they have a very low seek time.

Upvotes: 4

shfire
shfire

Reputation: 847

This is not a phenomenon, it is Caching

Upvotes: 0

Related Questions