silijon
silijon

Reputation: 942

Azure Table Storage batch insert with potentially pre-existing rowkeys

I'm trying to send a simple batch of Insert operations to Azure Table Storage but it seems that the whole batch transaction is invalidated and, using the managed azure storage client, the ExecuteBatch method itself throws an Exception if there is a single Insert in the batch to a pre-existing record. (using 2.0 client):

public class SampleEntity : TableEntity
{
    public SampleEntity(string partKey, string rowKey)
    {
        this.PartitionKey = partKey;
        this.RowKey = rowKey;
    }
}


var acct = CloudStorageAccount.DevelopmentStorageAccount;
var client = acct.CreateCloudTableClient();
var table = client.GetTableReference("SampleEntities");

var foo = new SampleEntity("partition1", "preexistingKey");
var bar = new SampleEntity("partition1", "newKey");

var batchOp = new TableBatchOperation();
batchOp.Add(TableOperation.Insert(foo));
batchOp.Add(TableOperation.Insert(bar));

var result = table.ExecuteBatch(batchOp);  // throws exception: "0:The specified entity already exists."

The batch-level exception is avoided by using InsertOrMerge but then every individual operation response returns a 204, whether or not that particular operation inserted or merged it. So it seems its impossible for the client application to retain knowledge of whether it, or another node in the cluster, inserted the record. Unforunately, in my current case, this knowledge is necessary for some downstream synchronization.

Is there some configuration or technique to allow the batch of inserts to proceed and return the particular response code per-item without throwing a blanket exception?

Upvotes: 2

Views: 3846

Answers (1)

Gaurav Mantri
Gaurav Mantri

Reputation: 136146

As you already know, since batch is a transaction operation you get an all-or-none kind of a deal. One thing interesting with batch transactions is that you get an index of first failed entity in the batch. So assuming you're trying to insert 100 entities in a batch and 50th entity is already present in the table, the batch operation will give you the index of failed entity (49 in this case).

Is there some configuration or technique to allow the batch of inserts to proceed and return the particular response code per-item without throwing a blanket exception?

I don't think so. The transaction would fail as soon as the first entity fails. It will not even attempt to process other entities.

Possible Solutions (Just thinking out loud :))

If I understand correctly, your key requirement is to identify if an entity was inserted or merged (or replaced). For this the approach would be to separate out failed entities from a batch and process them separately. Based on this, I can think of two approaches:

  1. What you could possibly do in this case is split that batch into 3 batches: 1st batch will contain 49 entities, 2nd batch will contain just 1 entity (which failed) and the 3rd batch will contain 50 entities. You could now insert all entities in the 1st batch, decide what you want to do with that failed entity and try to insert the 3rd batch. You would need to repeat the process over and over again till the time this operation is complete.
  2. Another idea would be to remove the failed entity from the batch and retry that batch. So in the example above, in your 1st attempt you'll try with 100 entities, in your 2nd attempt you'll try with 99 entities and so on and so forth keeping track of failed entities all the while (with the reason as to why they failed). Once the batch operation is successfully completed, you can work with all the failed entities.

Upvotes: 1

Related Questions