brainjam
brainjam

Reputation: 19005

How many shards in a Google App Engine sharded counter?

I read today about sharded counters in Google App Engine. The article says that you should expect to max out at about 5/updates per second per entity in the data store. But it seems to me that this solution doesn't 'scale' unless you have some way of knowing how many updates you are doing per second. For example, you can allocate 10 shards, but will then start choking at 50 updates per second.

So how do you know how fast the updates are coming, and how do you feed that number back into the number of shards?

My guess is that along with the counter you could keep some record of recent activity, and if you detect a spike you can increase the number of shards. Is that generally how it's done? And if so, why isn't it done in the sample code? (That last question may be unanswerable.) Is it more common practice to monitor website activity and update shard counts as traffic rises, as opposed to doing it automatically in the code?

Update: What are the practical consequences effects of having too few shards and choking? Does it simply mean that the website becomes unresponsive, or is it possible to lose counter updates because of timeouts?


As an aside, this question talks about implementing counters without sharding, but one of the answers impies that even memcache needs to be sharded if traffic is high. So this issue of shard allocation and tuning seems to be important.

Upvotes: 11

Views: 2228

Answers (3)

DougA
DougA

Reputation: 1255

Why not add to the number of shards when Exceptions begin to occur?

Based on this GAE Example:

try{
  Transaction tx = ds.beginTransaction();
  // increment shard
  tx.commit();           
} catch(DatastoreFailureException e){
   // Datastore is struggling to handle the current load, increase it / double it
   addShards( getShardCount() );

} catch(DatastoreTimeoutException to){
   // Datastore is struggling to handle the current load, increase it / double it 
   addShards( getShardCount() );

} catch (ConcurrentModificationException cm){
   // Datastore is struggling to handle the current load, increase it / double it 
   addShards( getShardCount() );             

}

Upvotes: 2

Nick Johnson
Nick Johnson

Reputation: 101149

To address the last part of your question: Your memcache values will not require sharding. A single memcache server can handle tens of thousands of QPS of fetches and updates, so no plausibly large app is going to need to shard its memcache keys.

Upvotes: 3

David Underhill
David Underhill

Reputation: 16243

It is clearly simpler to manually monitor your website's popularity and increase the number of shards as needed. I would guess that most sites take this approach. Doing it programatically would not only be difficult, but it sounds like it would add an unacceptable amount of overhead to keep a record of all recent activity and try to analyze it to dynamically adjust the number of shards you're using.

I would prefer the simpler approach of just erring a little on the high side with the number of shards you choose.

You are correct about the practical consequences of having too few shards. Updating a datastore entity more frequently than possible which will initially cause some requests to take a long time (while the writes retry). If you have enough of them pile up, then they will start to fail as requests time out. This will certainly lead to missed counters. On the upside, your page will be so slow that users should start leaving which should relieve the pressure on the datastore :).

Upvotes: 4

Related Questions