user01380121
user01380121

Reputation: 527

Performance of Cloud Bigtable row counts

I'd like to gauge Cloud Bigtable's performance at grabbing many row counts of keys with certain prefixes.

Say a schema has row keys with unix timestamp at the end, e.g., event_id#unix_timestamp.

If I needed to grab the total number of rows for each of 20 different event_id's, is Cloud Bigtable efficient at doing this? I'd either use a prefix or a row range query to do this.

Upvotes: 2

Views: 2128

Answers (2)

Igor Bernstein
Igor Bernstein

Reputation: 581

Cloud Bigtable doesn't have count operations, you would have to query the rows by key prefix and use a filter to minimize the amount of data returned per row. For example:

rowSet = RowRangeList(PrefixRange("event_id#"),...)
filter = ChainFilters(CellsPerRowLimitFilter(1), StripValueFilter())

count := 0
table.ReadRows(ctx, rowSet, func(r Row) bool {
  count++
  return true
}, RowFilter(filter))

Upvotes: 5

Solomon Duskis
Solomon Duskis

Reputation: 2711

The Cloud Bigtable service does very well at this type of query, and the GoLang library also performs quite well.

Timestamp queries are a bit tricky to get right. Generally, users of timeseries want to get queries like "get me the latest N values". Bigtable returns data in increasing values only, so you'll have to do a schema where the range starts at event_id#{max int64 - unix_timestamp} You would also need a LimitRows to get the latest N.

With Cloud Bigtable, it's important to ask the question of what you'll be doing with the data. That will inform your choice of schema.

Cloud Bigtable has a "discuss" group for general discussion, and GitHub repositories for language specific feature requests / questions. You can find more information at https://cloud.google.com/bigtable/docs/support/getting-support.

Upvotes: 5

Related Questions