Allan Xu
Allan Xu

Reputation: 9368

What is the mechanism that prevents a scaled out Azure Function trigger by the same blob multiple times

Scenario:

An azure function hosted on an app service plan and scaled out to 5 instances. The Azure function is triggered by Blob.

Question:

Is there any documentation that explains the mechanism that prevents a Scaled out Azure Function process the same blob multiple times? I am asking because there is more than one instance of the function is running.

Upvotes: 3

Views: 899

Answers (2)

Jerry Liu
Jerry Liu

Reputation: 17800

Agree with@Peter, here are my understandings for references, correct me if it doesn't make sense.

Blob trigger mechanism related info is stored in the Azure storage account for our Function app (defined by the app setting AzureWebJobsStorage). Locks locate in a blob container named azure-webjobs-hosts and there's a queue azure-webjobs-blobtrigger-<FunctionAppName> for internal use.

See another part in the same comment.

Normally only 1 of N host instances is scanning for new blobs (based on a singleton host id lock). When it finds a new blob it adds a queue message for it and one of the N hosts processes it.

So in the first step--scanning for new blobs, scale out feature doesn't participate. The singleton host id lock is implemented by blob lease as @Peter mentioned (check blob locks/<FunctoinAppName>/host in azure-webjobs-hosts).

Once internal queue starts receiving messages of new blobs, scale out feature begins to work as host instances fetch and process messages together. When a blob message is being processed it can't be seen by other instances and would be deleted later.

Besides, to ensure that blob processed never triggers function later(e.g. in next turn of scanning), another mechanism is blob receipts.

Upvotes: 3

Peter Bons
Peter Bons

Reputation: 29840

As far as I can tell blob leases are used.

It is backed by this comment made by a MS engineer working on the Azure Functions team.

The singleton mechanism used under the covers to ensure only one host processes a blob is based on the HostId. In regular scale out scenarios, the HostId is the same for all instances, so they collaborate via blob leases behind the scenes using the same lock blob scoped to the host id.

Upvotes: 3

Related Questions