Matus Cimerman
Matus Cimerman

Reputation: 447

Datastore/BigTable ACID and key update notifications

I've got Kafka topic which contains catalog data with following commands:

  1. item_upsert
  2. partial_item_update
  3. delete_item
  4. delete_all

Now I need to consume this topic, possible streaming 100k msgs/sec, to some DB, which will help me to translate original stream of commands to stream of item states. So there will be only current item state from DB. Basically DB will be used a lookup.

My idea was:

  1. Insert/Update/Delete items in Datastore,
  2. Once specific message is processed, I'll send new message to another stream telling downstream consumers that certain item was Inserted/Updated/Deleted. These consumers will afterwards read current state of item from Datastore and ingest item state to the another Kafka topic.

My worries are about ACID of Datastore. How "ACID" is it? Is it even suitable for such use-case?

I was also thinking about using cheaper BigTable, but that doesn't seems right choice for this use-case.

If you have any ideas/recommendations how else to solve this issue, I'll be glad.

Upvotes: 1

Views: 815

Answers (2)

Solomon Duskis
Solomon Duskis

Reputation: 2711

Bigtable can handle the rate of 100K with a 10 node cluster (I have run tests up to 3,500 nodes, which handles 35M updates per second). Bigtable has strong consistency for a single row upserts. Bigtable users design schemas that fit all of their transactional data into a single row.

Cloud Bigtable supports upserts, and does not have a distinction between insert and update. There is also a delete by range that could theoretically be used for your delete_all case.

The high transaction rate and the lower cost are the right reasons to use Cloud Bigtable. Alternatively, you can consider using Cloud Spanner which is meant for high throughput transactional data.

Upvotes: 1

Dan Cornilescu
Dan Cornilescu

Reputation: 39834

The first concern is the message rate. The datastore can not sustain per entity group write rates exceeding 1/sec (each entity is part of an entity group), see Limits. So if you expect more than one item/entity update per second the Datastore is not suitable.

To achieve ACID with Cloud Datastore you need to avoid Eventual Consistency. Which is possible. From Eventual Consistency when Reading Entity Values:

The eventual consistency on reading entity values can be avoided by using a keys-only query, an ancestor query, or lookup by key (the get() method). We will discuss these different types of queries in more depth below.

I would discard ancestor queries as a possibility, since it would require all the respective entities to be in the same entity group, amplifying the impact of the above-mentioned write limit. See also Updates to a single entity group.

The tricky part is the upsert operation, more specifically the difference between creating a new entity and updating/deleting an existing entity.

If you can't always generate/determine a unique item identifier from the item data (or pass along one determined in a previous stage) then it means you'd need a query, which can't be executed inside a transaction and whose result would be subject to eventual consistency. The datastore won't be suitable in such case either.

But if can get such unique identifier then you can use that as an entity key identifier and things are simple: the upsert operation becomes a simple transactional attempt to get the entity by that key (strongly consistent) and (inside the same transaction):

  • if the get fails with a doesn't exist code then create a new entity with that key
  • if the get succeeds update the entity and save it back

Upvotes: 0

Related Questions