MHS
MHS

Reputation: 41

High Performance Counters in Cloud Spanner

I'm looking to keep running counts of some items such as likes and comments on a post. The write rate can be high, e.g. 1K likes/sec.

Using a SELECT COUNT does not seem feasible even if the result set is indexed as there could be a few million rows to count.

I'm thinking of using a sharded counters approach where a specific counter (likes for a given post) consists of N shards/rows. Incrementing the counter would increment the column value of one shard's row, while reading the counter would read all shard rows and sum the count values. Would there be any issues in such an approach with Spanner?

I understand that in Bigtable, multiple updates to the same row will create new versions of cells in the row and as a result, you can cause a row to exceed its size limit. So using rows as sharded counters in Bigtable seems to be a bad idea. Does Spanner have any similar issues?

Upvotes: 2

Views: 1132

Answers (3)

Bora
Bora

Reputation: 131

A recently GA'ed feature simplifies this, offers equivalent write-throughput and latency as regular writes and also supports multi-cluster routing https://cloud.google.com/blog/products/databases/distributed-counting-with-bigtable

When defining the column family set it to intsum (below it is set to never garbage collect but you can set that to whatever you want)

cbt createfamily mytable families=myfamily:never:intsum

then add new values using addtocell

cbt addtocell mytable myrowkey mycolumnfamily:mycolumnqualifier=myvalue@mytimestamp

syntax is the same as SetCell just swap addtocell.

You can do this using the client libraries as well e.g.

https://cloud.google.com/bigtable/docs/reference/data/rpc/google.bigtable.v2#google.bigtable.v2.Mutation.AddToCell

The feature also supports inthll (for approximate count distinct), intmin and intmax for min/max. It is meant exactly for the use cases like you mentioned. You can put the post and its stats as columns in the same row, retrieve them altogether easily in a single query and display in your app.

Upvotes: 0

Ramesh Dharan
Ramesh Dharan

Reputation: 895

I understand that in Bigtable, multiple updates to the same row will create new versions of cells in the row and as a result, you can cause a row to exceed its size limit. So using rows as sharded counters in Bigtable seems to be a bad idea. Does Spanner have any similar issues?

As noted in the comments, you could use the ReadModifyWrite Increment API, with the caveat that row-transactional operations like ReadModifyWrite in Bigtable are slower.

However, using multiple rows to represent a single counter and then reading the rows together using a prefix scan should be fine.

The key would be to use arbitrary prefixes on the row key to distribute writes across the nodes in your cluster and avoid hotspotting.

Upvotes: 3

Biswa Nag
Biswa Nag

Reputation: 141

Sharding the counters to improve parallelism seems like a good idea. Cloud Spanner manages the older versions of the data in a different way than BigTable, so you may not hit the same limits. Spanner keeps old versions around for 1 hour. However you may want to take care in designing your schema to avoid hotspots.

I would recommend though that you try to implement a memory caching layer on top of Spanner. This could be used to:

  1. Batch some updates together.
  2. Do fast reads/counts.

There will be some tradeoffs in possibly losing some updates if the cache goes away, but that may be acceptable if it is just caching likes/counts.

Upvotes: 2

Related Questions