DaiKeung
DaiKeung

Reputation: 1167

Azure Function(C#): How to copy lots of files from blob container A to another blob container B? (Function has timeout in 10 mins)

Would like to use Azure Function to copy lots of files from blob container A to another blob container B. However, faced missing copying files due to Function timeout. Is there any method to resume it smartly? Is there any indication on source blob storage which identified copied/handled before so that next Function can skip copying that?

Upvotes: 1

Views: 1082

Answers (2)

Kashyap
Kashyap

Reputation: 17441

It's a problem of plenty. You can:

  1. copy from comannd line or code. AZ CLI or azcopy or .NET SDK (which can be extended to other language SDKs).
  2. use Storage explorer.
  3. use Azure Data Factory as Bowman suggested.
  4. use SSIS.
  5. [mis]use Databricks, especially if you are dealing with massive amount of data and need scalability.
  6. Write some code and use new "Put XXX from URL" APIs. E.g. "Put Blob from URL" will create a new blob. Put Block from URL will create a block in a block blob.

#1 and 2 would use your local machine's internet bandwidth (download to local and then upload) whereas 3, 4, 5 would be totally in cloud. So in case your source and destination are in same region, for 1 & 2 you'll end up paying egress charges, where as 3, 4 and 5 you won't.

Azure Functions to copy files is probably the worst thing you can do. Azure Functions cost is proportional to execution time (and memory usage). In this case (as it's taking more than 10 minutes) I assume you're moving large amount of data, so you'll be paying for your Azure Function because it's just waiting on I/O to complete a file transfer.

Upvotes: 1

suziki
suziki

Reputation: 14088

Would like to use Azure Function to copy lots of files from blob container A to another blob container B. However, faced missing copying files due to Function timeout.

You can avoid this timeout problem by changing the plan level. For example, if you use the app service plan and turn on always on, there will be no more timeout restrictions. But to be honest, if you have a lot of files and it takes a long time, then azure function is not a recommended method (the task performed by the function should be lightweight).

Is there any indication on source blob storage which identified copied/handled before so that next Function can skip copying that?

Yes, of course you can. Just add the custom metadata of the blob after it was copied. When you copy files next time, you can first check the custom metadata.

Upvotes: 1

Related Questions