Pete
Pete

Reputation: 95

Copy (multiple) files to multiple locations

Using C# (.NET 4.5) I want to copy a set of files to multiple locations (e.g. the contents of a folder to 2 USB drives attached to the computer).
Is there a more efficient way of doing that then just using foreach loops and File.Copy?

Working towards a (possible) solution.

My first thought was some kind of multi-threaded approach. After some reading and research I discovered that just blindly setting up some kind of parallel and/or async process is not a good idea when it comes to IO (as per Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?).

The bottleneck is the disk, especially if it's a traditional drive, as it can only read/write synchronously. That got me thinking, what if I read it once then output it in multiple locations? After all, in my USB drive scenario I'm dealing with multiple (output) disks.

I'm having trouble figuring out how to do that though. One I idea I saw (Copy same file from multiple threads to multiple destinations) was to just read all the bytes of each file into memory then loop through the destinations and write out the bytes to each location before moving onto the next file. It seems that's a bad idea if the files might be large. Some of the files I'll be copying will be videos and could be 1 GB (or more). I can't imagine it's a good idea to load a 1 GB file into memory just to copy it to another disk?

So, allowing flexibility for larger files, the closest I've gotten is below (based off How to copy one file to many locations simultaneously). The problem with this code is that I've still not got a single read and multi-write happening. It's currently multi-read and multi-write. Is there a way to further optimise this code? Could I read chunks into memory then write that chunk to each destination before moving onto the next chunk (like the idea above but chunked files instead of whole)?

files.ForEach(fileDetail =>
    Parallel.ForEach(fileDetail.DestinationPaths, new ParallelOptions(),
        destinationPath =>
        {
            using (var source = new FileStream(fileDetail.SourcePath, FileMode.Open, FileAccess.Read, FileShare.Read))
            using (var destination = new FileStream(destinationPath, FileMode.Create))
            {
                var buffer = new byte[1024];
                int read;

                while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
                {
                    destination.Write(buffer, 0, read);
                }
            }
        }));

Upvotes: 3

Views: 1891

Answers (2)

Pete
Pete

Reputation: 95

I thought I'd post my current solution for anyone else who comes across this question.

If anyone discovers a more efficient/quicker way to do this then please let me know!

My code seems to copy files a bit quicker than just running the copy synchronously but it's still not as fast as I'd like (nor as fast as I've seen some other programs do it). I should note that performance may vary depending on .NET version and your system (I'm using Win 10 with .NET 4.5.2 on a 13" MBP with 2.9GHz i5 (5287U - 2 core / 4 thread) + 16GB RAM). I've not even figured out the best combination of method (e.g. FileStream.Write, FileStream.WriteAsync, BinaryWriter.Write) and buffer size yet.

foreach (var fileDetail in files)
{
    foreach (var destinationPath in fileDetail.DestinationPaths)
        Directory.CreateDirectory(Path.GetDirectoryName(destinationPath));

    // Set up progress
    FileCopyEntryProgress progress = new FileCopyEntryProgress(fileDetail);

    // Set up the source and outputs
    using (var source = new FileStream(fileDetail.SourcePath, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize, FileOptions.SequentialScan))
    using (var outputs = new CompositeDisposable(fileDetail.DestinationPaths.Select(p => new FileStream(p, FileMode.Create, FileAccess.Write, FileShare.None, bufferSize))))
    {
        // Set up the copy operation
        var buffer = new byte[bufferSize];
        int read;

        // Read the file
        while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
        {
            // Copy to each drive
            await Task.WhenAll(outputs.Select(async destination => await ((FileStream)destination).WriteAsync(buffer, 0, read)));

            // Report progress
            if (onDriveCopyFile != null)
            {
                progress.BytesCopied = read;
                progress.TotalBytesCopied += read;

                onDriveCopyFile.Report(progress);
            }
        }
    }

    if (ct.IsCancellationRequested)
        break;
}

I'm using CompositeDisposable from Reactive Extensions (https://github.com/Reactive-Extensions/Rx.NET).

Upvotes: 2

VMAtm
VMAtm

Reputation: 28356

IO operations in general should be considered as asynchronous as there is some hardware operations which are run outside your code, so you can try to introduce some async/await constructs for read/write operations, so you can continue the execution during hardware operations.

while ((read = await source.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
    await destination.WriteAsync(buffer, 0, read);
}

You also must to mark your lambda delegate as async to make this work:

async destinationPath => 
...

And you should await the resulting tasks all the way. You may find more information here :

Parallel foreach with asynchronous lambda

Nesting await in Parallel.ForEach

Upvotes: 1

Related Questions