Reputation: 1039
I'm trying to copy large set of files from one S3 to another S3, using asynchronous method. To achieve the same, the large set of files is broken into batches and each batch is handed over to a list of async method. The issue is, each async method is not processing more than 1 file in the batch, whereas each batch contains more than 1k files, not sure why async doesn't go back to process the remaining files.
Here is the code:
public void CreateAndExecuteSpawn(string srcBucket, List<List<string>> pdfFileList, IAmazonS3 s3client)
{
int i = 0;
List<Action> actions = new List<Action>();
LambdaLogger.Log("PDF Set count: " + pdfFileList.Count.ToString());
foreach (var list in pdfFileList)
actions.Add(() => RenameFilesAsync(srcBucket, list, s3client));
foreach (var method in actions)
{
method.Invoke();
LambdaLogger.Log("Mehtod invoked: "+ i++.ToString());
}
}
public async void RenameFilesAsync(string srcBucket, List<string> pdfFiles, IAmazonS3 s3client)
{
LambdaLogger.Log("In RenameFileAsync method");
CopyObjectRequest copyRequest = new CopyObjectRequest
{
SourceBucket = srcBucket,
DestinationBucket = srcBucket
};
try
{
foreach (var file in pdfFiles)
{
if (!file.Contains("index.xml"))
{
string[] newFilename = file.Split('{');
string[] destKey = file.Split('/');
copyRequest.SourceKey = file;
copyRequest.DestinationKey = destKey[0] + "/" + destKey[1] + "/Renamed/" + newFilename[1];
LambdaLogger.Log("About to rename File: " + file);
//Here after copying one file, function doesn't return to foreach loop
CopyObjectResponse response = await s3client.CopyObjectAsync(copyRequest);
//await s3client.CopyObjectAsync(copyRequest);
LambdaLogger.Log("Rename done: ");
}
}
}
catch(Exception ex)
{
LambdaLogger.Log(ex.Message);
LambdaLogger.Log(copyRequest.DestinationKey);
}
}
public void FunctionHandler(S3Event evnt, ILambdaContext context)
{
//Some code here
CreateAndExecuteSpawn(bucket, pdfFileSet, s3client);
}
Upvotes: 0
Views: 230
Reputation: 456457
First you need to fix the batch so that it will process the batches one at a time. Avoid async void
; use async Task
instead:
public async Task CreateAndExecuteSpawnAsync(string srcBucket, List<List<string>> pdfFileList, IAmazonS3 s3client)
{
int i = 0;
List<Func<Task>> actions = new();
LambdaLogger.Log("PDF Set count: " + pdfFileList.Count.ToString());
foreach (var list in pdfFileList)
actions.Add(() => RenameFilesAsync(srcBucket, list, s3client));
foreach (var method in actions)
{
await method();
LambdaLogger.Log("Mehtod invoked: "+ i++.ToString());
}
}
public async Task RenameFilesAsync(string srcBucket, List<string> pdfFiles, IAmazonS3 s3client)
Then you can add asynchronous concurrency within each batch. The current code is just a foreach
loop, so of course it only processes one at a time. You can change this to be asynchronously concurrent by Select
ing the tasks to run and then doing a Task.WhenAll
at the end:
LambdaLogger.Log("In RenameFileAsync method");
CopyObjectRequest copyRequest = new CopyObjectRequest
{
SourceBucket = srcBucket,
DestinationBucket = srcBucket
};
try
{
var tasks = pdfFiles
.Where(file => !file.Contains("index.xml"))
.Select(async file =>
{
string[] newFilename = file.Split('{');
string[] destKey = file.Split('/');
copyRequest.SourceKey = file;
copyRequest.DestinationKey = destKey[0] + "/" + destKey[1] + "/Renamed/" + newFilename[1];
LambdaLogger.Log("About to rename File: " + file);
CopyObjectResponse response = await s3client.CopyObjectAsync(copyRequest);
LambdaLogger.Log("Rename done: ");
})
.ToList();
await Task.WhenAll(tasks);
}
Upvotes: 2