Peter Taylor
Peter Taylor

Reputation: 5036

Can I measure a TableBatchOperation's size?

The .Net SDK documentation for TableBatchOperation says that

A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.

It's easy to ensure that I don't add more than 100 individual table operations to the batch: in the worst case, I can check the Count property. But is there any way to check the payload size other than manually serialising the operations (at which point I've lost most of the benefit of using the SDK)?

Upvotes: 1

Views: 1280

Answers (2)

Peter Taylor
Peter Taylor

Reputation: 5036

I followed Emily Gerner's suggestion using optimistic inserts and error handling, but using StorageException.RequestInformation.EgressBytes to estimate the number of operations which fit in the limit. Unless the size of the operations varies wildly, this should be more efficient. There is a case to be made for not raising len every time, but here's an implementation which goes back to being optimistic each time.

        int off = 0;
        while (off < ops.Count)
        {
            // Batch size.
            int len = Math.Min(100, ops.Count - off);
            while (true)
            {
                var batch = new TableBatchOperation();
                for (int i = 0; i < len; i++) batch.Add(ops[off + i]);

                try
                {
                    _Tbl.ExecuteBatch(batch);
                    break;
                }
                catch (Microsoft.WindowsAzure.Storage.StorageException se)
                {
                    var we = se.InnerException as WebException;
                    var resp = we != null ? (we.Response as HttpWebResponse) : null;
                    if (resp != null && resp.StatusCode == HttpStatusCode.RequestEntityTooLarge)
                    {
                        // Assume roughly equal sizes, and base updated length on the size of the previous request.
                        // We assume that no individual operation is too big!
                        len = len * 4000000 / (int)se.RequestInformation.EgressBytes;
                    }
                    else throw;
                }
            }

            off += len;
        }

Upvotes: 1

Emily Gerner
Emily Gerner

Reputation: 2457

As you add entities you can track the size of the names plus data. Assuming you're using a newer library where the default is Json, the additional characters added should be relatively small (compared to the data if you're close to 4MB) and estimable. This isn't a perfect route, but it would get you close.

Serializing as you go especially if you're actually getting close to the 100 entity limit or the 4MB limit frequently is going to lose you a lot of perf, aside from any convenience lost. Rather than trying to track as you go either by estimating size or serializing, you might be best off sending the batch request as-is and if you get a 413 indicating request body too large, catch the error, divide the batch in 2, and continue.

Upvotes: 3

Related Questions