Dumping Azure tables quickly

Question

My task is to dump entire Azure tables with arbitrary unknown schemas. Standard code to do this resembles the following:

TableQuery query = new TableQuery();
foreach (DynamicTableEntity entity in table.ExecuteQuery(query))
{
  // Write a dump of the entity (row).
}

Depending on the table, this works at a rate of 1000-3000 rows per second on my system. I'm guessing this (lack of) performance has something to do with separate HTTP requests issued to retrieve the data in chunks. Unfortunately, some of the tables are multi-gigabyte in size, so this takes a rather long time.

Is there a good way to parallelize the above or speed it up some other way? It would seem that those HTTP requests could be sent by multiple threads, as in web crawlers and the like. However, I don't see an immediate method to do so.

Gaurav Mantri · Accepted Answer

Unless you know the PartitionKeys of the entities in the table (or some other querying criteria which includes PartitionKey), AFAIK you would need to take a top down approach which you're doing right now. In order for you to fire queries in parallel which would work efficiently you have to include PartitionKey in your queries.

Dumping Azure tables quickly

Answers (1)

Related Questions