Reputation: 447
I've got Kafka topic which contains catalog data with following commands:
Now I need to consume this topic, possible streaming 100k msgs/sec, to some DB, which will help me to translate original stream of commands to stream of item states. So there will be only current item state from DB. Basically DB will be used a lookup.
My idea was:
My worries are about ACID of Datastore. How "ACID" is it? Is it even suitable for such use-case?
I was also thinking about using cheaper BigTable, but that doesn't seems right choice for this use-case.
If you have any ideas/recommendations how else to solve this issue, I'll be glad.
Upvotes: 1
Views: 815
Reputation: 2711
Bigtable can handle the rate of 100K with a 10 node cluster (I have run tests up to 3,500 nodes, which handles 35M updates per second). Bigtable has strong consistency for a single row upserts
. Bigtable users design schemas that fit all of their transactional data into a single row.
Cloud Bigtable supports upserts
, and does not have a distinction between insert
and update
. There is also a delete by range that could theoretically be used for your delete_all
case.
The high transaction rate and the lower cost are the right reasons to use Cloud Bigtable. Alternatively, you can consider using Cloud Spanner which is meant for high throughput transactional data.
Upvotes: 1
Reputation: 39834
The first concern is the message rate. The datastore can not sustain per entity group write rates exceeding 1/sec (each entity is part of an entity group), see Limits. So if you expect more than one item/entity update per second the Datastore is not suitable.
To achieve ACID with Cloud Datastore you need to avoid Eventual Consistency. Which is possible. From Eventual Consistency when Reading Entity Values:
The eventual consistency on reading entity values can be avoided by using a keys-only query, an ancestor query, or lookup by key (the get() method). We will discuss these different types of queries in more depth below.
I would discard ancestor queries as a possibility, since it would require all the respective entities to be in the same entity group, amplifying the impact of the above-mentioned write limit. See also Updates to a single entity group.
The tricky part is the upsert
operation, more specifically the difference between creating a new entity and updating/deleting an existing entity.
If you can't always generate/determine a unique item identifier from the item data (or pass along one determined in a previous stage) then it means you'd need a query, which can't be executed inside a transaction and whose result would be subject to eventual consistency. The datastore won't be suitable in such case either.
But if can get such unique identifier then you can use that as an entity key identifier and things are simple: the upsert
operation becomes a simple transactional attempt to get
the entity by that key (strongly consistent) and (inside the same transaction):
get
fails with a doesn't exist code then create a new entity with that keyget
succeeds update the entity and save it backUpvotes: 0