Assaf Lavie
Assaf Lavie

Reputation: 76063

File Copying optimization through multiple threads

Can you make file copying faster through multiple threading?

Edit: To clarify, suppose you were implementing CopyFile(src, tgt). It seems logical that under certain circumstances you could use multiple threads to make it go faster.

Edit Some more thoughts:

Naturally, it depends on the HW/storage in question.

If you're copying from one disk to another, for example, it's pretty clear that you can read/write at the same time using two threads, thus saving the performance cost of the fastest of the two (usually reading). But you don't really need multiple threads for reading/writing in parallel, just async-IO.

But if async-IO can really speed things up (up to 2x) when reading/writing from different disks, why isn't this the default implementation of CopyFile? (or is it?)

Upvotes: 5

Views: 5208

Answers (6)

Michael Burr
Michael Burr

Reputation: 340306

You can see a benefit particularly if the files are on different devices in which case the I/O can be very effectively overlapped.

However, there are also cases where where you could easily cause thrashing of the hardware, so I don't think it's an optimization that should be taken lightly.

As far as the additional question you added:

But if async-IO can really speed things up (up to 2x) when reading/writing from different disks, why isn't this the default implementation of CopyFile? (or is it?)

I don't know the internals of CopyFile(), but I wouldn't be surprised if they do not do it for a couple reasons:

  1. if they were to implement it using an additional thread (or threads) that might be a bit more intrusive to a process than is appropriate (especially if the process is single threaded to this point)
  2. if they were to try to implement it using asynchronous I/O with a single thread (as ChrisW indicated is a possibility), they might be as likely to cause thrashing problems as improve performance. It might not be easy to generically determine when you'll get a benefit as opposed to a detriment.

This is not to say it couldn't or shouldn't be done (or even that it isn't done - I don't know) - these are just a couple possible reasons why it might not be done.

Upvotes: 2

Nir
Nir

Reputation: 29594

Here's a blog post about file copy performance improvements in Vista SP1:

http://blogs.technet.com/markrussinovich/archive/2008/02/04/2826167.aspx

Doing high performance file copy is crazy and you have to take into account things like the cache behavior and network drivers limitations.

So always use the OS file copy function (under Windows it's FileCopyEx) and don't write your own.

Upvotes: 3

ChrisW
ChrisW

Reputation: 56123

If you were implementing CopyFile, then instead of using multiple threads (e.g. one thread for reading and another thread for writing) you could use a single thread which initiates asynchronous I/O (so that one thread can initiate/reinitiate read and write simultaneously), using completion ports or whatever.

For improved perfomance, it might be implemented entirely in the kernel.

Upvotes: 1

Nir
Nir

Reputation: 29594

It depends, but generally no, your bottleneck is going to be disk IO and you can't make disk IO faster using multiple threads.

Even in the extremely rare cases this will work the thread synchronization code would have to be so complicated it wouldn't be worth it.

Upvotes: 1

sblundy
sblundy

Reputation: 61424

I would think not. There's so little for the CPU to do.

Upvotes: 2

Otávio Décio
Otávio Décio

Reputation: 74290

If you are not careful you can make it slower. Disks are good at serialized access, if you have multiple threads the disk heads will be all over the place. Now if you are dealing with a high performance SAN maybe you have an improvement in performance, and the SAN will deal with optimizing the disk access.

Upvotes: 4

Related Questions