Eduard Keilholz
Eduard Keilholz

Reputation: 933

Azure Functions updating a Table Storage entity, concurrency problem

First, this may end up as being a noob question, but still... I'm trying to find a good solution for the concept and google just doesn't work out for me.

I'm creating an import with Azure Functions. A function is triggered when a BLOB is uploaded to a BLOB Storage Container. The function reads the file, and passes each import entry on a validation queue. When validation fails, the entity will be passed to a failure queue. When validation succeeds, the entity will be queued on a success queue.

Another function reads the success queue and starts writing the entities to table storage. When that process fails, the entity will be queued to the failure queue.

Entities in the failure queue are handled and stored so the user knows what went wrong, and the original import data is stored so the user can fix the error.

In my opinion, this is the best way to implement an import procedure with functions. It's a lot of functions, I know... But they all have a single responsibility (e.g. reading file, validation, etc etc).

Now my problem is, when the entity is stored succesfully OR when the failed entity is stored succesfully, I also send a message to a status queue telling that the import for a certain entity succeeded or not. Another Azure Function handles that status queue and updates A SINGLE entry in table storage with the latest status. The status for an import looks like so:

CorrelationId <-- The import identifier
StartedAt (date time)
TotalEntries (int)
Succeeded (int)
Failed (int)
CompletedAt (date time)

Obviously, the Succeeded or Failed int, is increased with one for each message on the queue. Also, when the queue size grows, the amount of AZ Functions instances grow and more functions start updating the table storage entry concurrently, resulting in errors. I can chose to use the ETag, which causes some queue entries to fail (and it's slow!!), or I can set ETag to "*" which makes the process much faster, but then I simply miss data. How should you handle / solve such a situation?

Thanks a bunch!

Upvotes: 0

Views: 509

Answers (2)

Thiago Custodio
Thiago Custodio

Reputation: 18362

Azure Table Storage is not the best option to handle this type of scenario. You should create your materialized view on SQL Database or Cosmos DB which better handles concurrency/transactions.

As another option, you could use Event Hubs / Stream Analytics to give you the real time statistics.

Upvotes: 1

Gaurav Mantri
Gaurav Mantri

Reputation: 136356

Not really an answer to your question but think of it as some "food for thought" :).

What you can do is create a table (let's call it StatusTracker) where you'll store the status of each operation. This would be a simple table with PartitionKey as ticks (or reverse ticks) representation of date/time (you can set it at minute granularity so that data for a single minute is stored in a single partition), RowKey as unique id of the operation and then a status attribute which tells you if the operation was failed or successful. You can use this table to provide near real time feedback to the user. Since you're always inserting records in this table, you will not run into concurrency issues.

Then you can write another function (Oh No!...Not another function :)) that will be timer triggered (runs every 5 minutes for example). What it will do is query the first table (StatusTracker), summarize the data and updates your status table. Since this table is only updated by a single function, again you will not run into concurrency issues.

Upvotes: 0

Related Questions