Reputation: 30809
I have the following BigTable structure for an example:
Table1 : column_family_1 : column_1 : value
The value
here is a number let's say. This is managed by a dataflow and I want to update the value every time.
This value might be an amount and I want to update it every time user makes a purchase (to maintain total spent till date), so I am doing the following in purchase event listener dataflow (whenever a purchase event is encountered):
Put
request to update the valueAlthough this approach has some network latency, it seems to work. The scenario where this fails is, when there are multiple workers of a dataflow, user makes more than one purchase and the events go to multiple workers, e.g.:
Put
requests and they get overwrittenTo prevent this, I am trying to make a request which just says in plain text, add 10 to the spent amount value
. Is this something we can do in dataflow?
Upvotes: 0
Views: 6284
Reputation: 1
Maybe you can try add each transaction updates as separate column, using timestamps as qualifies, so total amount spent is simply summing all the columns? Periodically you can compact the N columns into one timestamp and the update can be atomic.
Upvotes: 0
Reputation: 61
Another solution could be the following:
AbstractCloudBigtableTableDoFn
to do a Put
insert into the append-only table.Put
insert into that aggregated table.That way:
AbstractCloudBigtableTableDoFn
, a consistent aggregate can be obtained. Upvotes: 2
Reputation: 2711
Bigtable has the capability to Increment
values. You can see more details in the protobuf documentation.
Idempotency plays an important role in understanding counters in Bigtable.
In Bigtable, the Put
s are generally idempotent, which means that you can run them multiple times and always get the same result (a=2
will produce the same result no matter how many times you run it). Increment
s are not idempotent, since running them multiple times will produce different results (a++
, a++
has a different result than a++
, a++
, a++
).
Transient failures may or may not perform the Increment
. It's never clear from the client-side if the Increment
succeeds during those transient errors.
This Increment
feature is complicated to build in Dataflow because of this idempotency. Dataflow has a concept of "bundles" which is a set of actions that act as a unit of work. Those bundles are retried for transient failures (you can read more about Dataflow transient failure retries here). Dataflow treats that "bundle" as a unit, bug Cloud Bigtable has to treat each individual item in the "bundle" as a distinct transaction, since Cloud Bigtable does not support multi-row transactions.
Given the mismatch in the expected behavior of "bundles", Cloud Bigtable will not allow you to run Increment
s via Dataflow.
The options you have deserve more documentation than what I can provide here, but I can provide some options at a high level:
Always use Put
for any new event you find, and sum up the values on Reads. You can also write another job that does periodic clean up of rows by creating a "transaction" that deletes all current values, and writes a new cell with the sum
Use Cloud Functions which listens to Pub/Sub events and performs Increment
s. Here's a Cloud Bigtable example using Cloud Functions. You can also perform a Get
, perform the addition and do a CheckAndMutate
with the algorithm you describe in your post (I personally would opt for CheckAndMutate
for consistency, if I were to choose this option).
Use AbstractCloudBigtableTableDoFn
to write your own DoFn
that performs Increment
s, or CheckAndMutate
, but with the understanding that this may cause data integrity problems.
If the system is large enough, option #1 is your most robust option, but comes at the cost of system complexity. If you don't want that complexity, the option #2 is your next best bet (although I would opt for CheckAndMutate
). If you don't care about having data integrity and need high throughput (like "page counts" or other telemetry where it's ok to be wrong a small fraction of the time), then option #3 is going to be your best bet.
Upvotes: 5