JMR
JMR

Reputation: 37

Upload Files in Parallel with WebClient

I need to use WebClient in a project to split a file into multiple parts and upload them in parallel. So far, I'm able to upload the parts one at a time, but am unsure as to how to upload them in parallel.

I have an UploadPart method which looks like this:

private async Task<PartETag> UploadPart(string filePath, string preSignedUrl, int partNumber)
{
    WebClient wc = new();
    wc.UploadProgressChanged += WebClientUploadProgressChanged;
    wc.UploadFileCompleted += WebClientUploadCompleted;
    _ = await wc.UploadFileTaskAsync(new Uri(preSignedUrl), "PUT", filePath);

    // Obtain the WebHeaderCollection instance containing the header name/value pair from the response.
    WebHeaderCollection myWebHeaderCollection = wc.ResponseHeaders;
    string formattedETag = myWebHeaderCollection.GetValues("ETag").FirstOrDefault().Replace(@"""", "");
    PartETag partETag = new(partNumber, formattedETag);

    return partETag;
}

Its called inside a foreach loop:

foreach (var part in parts)
{
    var partETag = await UploadPart(part.FilePath, part.PresignedUrl, part.Number);
    partETags.Add(partETag);
}

How can I modify this so that I upload parts in parallel (up to a max of 10 parts at once) while still returning the PartETag values in the response header?

Upvotes: 1

Views: 700

Answers (1)

David Peden
David Peden

Reputation: 18444

This is a perfect scenario for TPL Dataflow:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

var parts = new List<Part>();
var partEtags = new List<PartETag>();

var transformBlock = new TransformBlock<Part, PartETag>
(
    async part => await UploadPart(part.FilePath, part.PreSignedUrl, part.PartNumber),
    new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 10}
);

var actionBlock = new ActionBlock<PartETag>(partETag => partEtags.Add(partETag));

transformBlock.LinkTo(actionBlock, new DataflowLinkOptions {PropagateCompletion = true});

foreach (Part part in parts)
{
    transformBlock.Post(part);
}

transformBlock.Complete();

await actionBlock.Completion;

I made some assumptions about your classes since you didn't show all of your code. The parts list at the top obviously needs to have instances in it.

This code creates a data flow that does the work asynchronously and caps the parallel executions to 10. The blocks are linked with completion propagated so we await the completion of the action block to make sure everything finishes.

Once that's done, your partEtags list will contain all of your results.

Upvotes: 1

Related Questions