javagoogle-app-enginetransactionsgoogle-cloud-datastore

Reputation: 233

Is there a per-request limit for simultaneous transactions?

I'm using a lot of (sharded) counters in my application. According to my current design, a single request can cause 100-200 of different counters to increment.

So for each counter I'm picking up one shard whose value I'm incrementing. I'm incrementing each shard in a transaction, which means I will end up doing 100-200 transactions as part of processing a single request. Naturally I intend to do this asynchronously, so that I will be essentially running all 100-200 transactions in parallel.

As this number feels pretty high, I'm left wondering whether there is some per-request or per-instance limit for the amount of simultaneous transactions (or datastore requests). I could not find information on this from the documentation.

By the way for some reason Google's documentation states that "if your app has counters that are updated frequently, you should not increment them transactionally" [1], but on the other hand their code example on sharding counters uses a transaction for incrementing the shard [2]. I have figured I can use transactions if I just use enough shards. I prefer transactions as I would like my counters not to miss increments.

Upvotes: 2

Answers (3)

Juho Ojala

Reputation: 233

Thanks for the responses! I think I now have the answers I need.

Regarding the per-request or per-instance limit

There is per-instance limit for concurrent threads, which effectively limits the amount of concurrent transactions. The default limit is 10. It can be incremented, but it is unclear what side-effects that will have.

Regarding the underlying problem

I chose to divide the counters in groups in such a way that counters that are usually incremented "together" are in the same group. Shards carry partial counts for all counters within the group the individual shard is associated with.

Counts are incremented still in transactions, but due to grouping only a maximum of five transactions per request is needed. Each transaction increments numerous partial counts stored in a single shard, which is represented as a single datastore entity.

Even if the transactions are run in series, the time to process a request will still be acceptable. Each counter group has a few hundred counters. I make sure there are enough shards to avoid contention.

It should be noted that this solution is only possible because the counters can be divided into fairly large groups of counters that are typically incremented together.

Upvotes: 1

Nick

Reputation: 1822

There are three limitations that will probably cause you problems here:

1/sec write limit per entity group
5 entity groups per XG
10 concurrent 'threads' per instance

The last one is the tricky one for your use case.

Its a bit hard to find info on (and may in fact be out of date information - so its worth testing), but each instance only allows 10 concurrent core threads (regardless of size - F1/F2/F...).

That is, ignoring the creation of background threads, if you were to assume that each request takes a thread, as does each RPC (datastore, memcache, text search etc), you can only use 10 at a time. If the scheduler thinks an incoming request would exceed 10, it will route the request to a new instance.

In a scenario you want to write to 100 entities in parallel, i'd expect it to only allow about 10 concurrent writes (the rest blocking), but also your instance could only service one request at a time.

Alternatives for you:

Use dedicated memcache - you'll need to handle backing the counters onto durable storage but you could do that in batches on a backend. This may result in you losing some data due to flushes, whether thats ok or not you'll have to decide
Use CloudSQL sequences or tables - if you dont require huge scale, but do require lots of counters, this may be a good approach - you could store counts as raw counts, or as timeseries data and post-process for accurate counts
Use pull queues to update counters in batches on a backend. You can process many 'events' across your many counter tables in larger batches. The downside is that the counts will not be up to date at any given point in time

The best approach is probably a hybrid.

For example, accepting some eventual consistency in counts:

When a request comes in - atomic increment of counters in memcache
When a request comes in - queue an 'event' task
Serve needed counts from memcache - if not present load from the datastore
Use TTLs on memcache, so that eventually the datastore is seen as the 'source of truth'
Run a cron which pulls 100 'event' tasks off the queue every 5 minutes (or as appropriate), and updates counters for all the events in a transaction in the datastore

UPDATE: I found this section in the docs, talking about controlling max number of concurrent requests, it makes a nebulous reference to

You may experience increased API latency if this setting is too high.

I'd say it's worth playing with.

Upvotes: 2

dyoo

Reputation: 12023

I see that you're using a sharded counter approach to avoid contention, as described in: cloud.google.com/appengine/articles/sharding_counters.

Can you collect all your counters in a single entity, so that each shard is a bunch of counters? Then you wouldn't need so many separate transactions. According to cloud.google.com/appengine/docs/python/ndb/#quotas, an entity can be 1MB max, and certainly 200 integers will fit into that size restriction just fine.

It may be that you don't know the property names in advance. Here is an approach expressed in Go using its PropertyLoadSaver interface that can deal with dynamic counter names.

const (
    counterPrefix = "COUNTER:"
)

type shard struct {
    // We manage the saving and loading of counters explicitly.
    counters map[string]int64 `datastore:"-"`
}

// NewShard construct a new shard.
func NewShard() *shard {
    return &shard{make(map[string]int64)}
}

// Save implements PropertyLoadSaver.
func (s *shard) Save(c chan<- datastore.Property) error {
    defer close(c)
    for key, value := range s.counters {
        c <- datastore.Property{
            Name:  counterPrefix + key,
            Value: value,
            NoIndex: true,
        }
    }
    return nil
}

// Load implements PropertyLoadSaver.
func (s *shard) Load(c <-chan datastore.Property) error {
    s.counters = make(map[string]int64)
    for prop := range c {
        if strings.HasPrefix(prop.Name, counterPrefix) {
            s.counters[prop.Name[len(counterPrefix):]] = prop.Value.(int64)
        }
    }
    return nil
}

The key is to use the raw API for defining your own property names when saving to the datastore. The Java API almost certainly has similar access, given the existence of PropertyContainer.

And the rest of the code in described in the sharding article would be expressed in terms of manipulating a single entity that knows about mutiple counters. So, for example, rather than having Increment() deal with a single counter:

// // Increment increments the named counter.
func Increment(c appengine.Context, name string) error {
    ...
}

we'd change its signature to a bulk-oriented operation:

// // Increment increments the named counters.
func Increment(c appengine.Context, names []string) error {
    ...
}

and the implementation would find a single shard, call Increment() for each of the counters we'd want to increment, and Save() that single entity to the datastore, all within a single transaction. Query would also involve consulting all the shards... but reads are fast. We still maintain the sharding architecture to avoid write contention.

The complete example code for Go is:

package sharded_counter

import (
    "fmt"
    "math/rand"
    "strings"

    "appengine"
    "appengine/datastore"
)

const (
    numShards     = 20
    shardKind     = "CounterShard"
    counterPrefix = "counter:"
)

type shard struct {
    // We manage the saving and loading of counters explicitly.
    counters map[string]int64 `datastore:"-"`
}

// NewShard constructs a new shard.
func NewShard() *shard {
    return &shard{make(map[string]int64)}
}

// Returns a list of the names stored in the shard.
func (s *shard) Names() []string {
    names := make([]string, 0, len(s.counters))
    for name, _ := range s.counters {
        names = append(names, name)
    }
    return names
}

// Lookup finds the counter's value.
func (s *shard) Lookup(name string) int64 {
    return s.counters[name]
}

// Increment adds to the counter's value.
func (s *shard) Increment(name string) {
    s.counters[name]++
}

// Save implements PropertyLoadSaver.
func (s *shard) Save(c chan<- datastore.Property) error {
    for key, value := range s.counters {
        c <- datastore.Property{
            Name:    counterPrefix + key,
            Value:   value,
            NoIndex: true,
        }
    }
    close(c)
    return nil
}

// Load implements PropertyLoadSaver.
func (s *shard) Load(c <-chan datastore.Property) error {
    s.counters = make(map[string]int64)
    for prop := range c {
        if strings.HasPrefix(prop.Name, counterPrefix) {
            s.counters[prop.Name[len(counterPrefix):]] = prop.Value.(int64)
        }
    }
    return nil
}

// AllCounters returns all counters.
func AllCounters(c appengine.Context) (map[string]int64, error) {
    var results map[string]int64
    results = make(map[string]int64)
    q := datastore.NewQuery(shardKind)
    q = q.Ancestor(ancestorKey(c))
    for t := q.Run(c); ; {
        var s shard
        _, err := t.Next(&s)
        if err == datastore.Done {
            break
        }
        if err != nil {
            return results, err
        }
        for _, name := range s.Names() {
            results[name] += s.Lookup(name)
        }
    }
    return results, nil
}

// ancestorKey returns an key that all counter shards inherit.
func ancestorKey(c appengine.Context) *datastore.Key {
    return datastore.NewKey(c, "CountersAncestor", "CountersAncestor", 0, nil)
}

// Increment increments the named counters.
func Increment(c appengine.Context, names []string) error {
    shardName := fmt.Sprintf("shard%d", rand.Intn(numShards))
    err := datastore.RunInTransaction(c, func(c appengine.Context) error {
        key := datastore.NewKey(c, shardKind, shardName, 0, ancestorKey(c))
        s := NewShard()
        err := datastore.Get(c, key, s)
        // A missing entity and a present entity will both work.
        if err != nil && err != datastore.ErrNoSuchEntity {
            return err
        }
        for _, name := range names {
            s.Increment(name)
        }
        _, err = datastore.Put(c, key, s)
        return err
    }, nil)
    return err
}

which, if you look closely, is pretty much the example with a single, unnamed counter, but extended to handle multiple counter names. I changed a little bit on the query side so that the reads are using the same ancestor key so that we're in the same entity group.

Upvotes: 1

Is there a per-request limit for simultaneous transactions?

Answers (3)

Regarding the per-request or per-instance limit

Regarding the underlying problem

Related Questions