Haden Hooyeon Lee
Haden Hooyeon Lee

Reputation: 305

Unable to count the number of rows in BigTable

https://cloud.google.com/bigtable/docs/go/cbt-reference

As in this reference, I tried the following command

cbt count <table>

for three different tables.

For one of them I got what I expected: the number of rows, a bit shy of 1M.

For the second table, I got the following error:

[~]$ cbt count prod.userprofile
2016/10/23 22:47:48 Reading rows: rpc error: code = 4 desc = Error while reading table 'projects/focal-elf-631/instances/campaign-stat/tables/prod.userprofile'
[~]$ cbt count prod.userprofile
2016/10/23 23:00:23 Reading rows: rpc error: code = 4 desc = Error while reading table 'projects/focal-elf-631/instances/campaign-stat/tables/prod.userprofile'

I tried it several times, but I got the same error every time.

For the last one, I got a different error (the error code is the same as above, but its description is different):

[~]$ cbt count prod.appprofile
2016/10/23 22:45:17 Reading rows: rpc error: code = 4 desc = Error while reading table 'projects/focal-elf-631/instances/campaign-stat/tables/prod.appprofile' : Response was not consumed in time; terminating connection. (Possible causes: row size > 256MB, slow client data read, and network problems)
[~]$ cbt count prod.appprofile
2016/10/23 23:11:10 Reading rows: rpc error: code = 4 desc = Error while reading table 'projects/focal-elf-631/instances/campaign-stat/tables/prod.appprofile' : Response was not consumed in time; terminating connection. (Possible causes: row size > 256MB, slow client data read, and network problems)

I also tried this one several times, and nothing changed.

I googled and searched on stackoverflow with the 'rpc error code 4' as keywords, but did not find anything useful.

I'm really curious why this command would fail, and what I can do to resolve this (by the way, these two tables are being used in production 24/7 and we have several dozens of big table nodes working just fine, so I don't think it has to do with bandwidth or QPS).

Upvotes: 2

Views: 7959

Answers (3)

ashwaniKumar
ashwaniKumar

Reputation: 87

You can try the below command :

cbt -project <name_of_project> -instance <name_of_instance> count <name_of_table>

Upvotes: 0

danius
danius

Reputation: 2774

As an alternative a possible (although not the best one) is using atomic counters, that is:

  1. Happybase: https://google-cloud-python-happybase.readthedocs.io/en/latest/happybase-table.html#google.cloud.happybase.table.Table.counter_inc
  2. Native API: https://googlecloudplatform.github.io/google-cloud-python/stable/bigtable-row.html#google.cloud.bigtable.row.AppendRow.increment_cell_value

If you design a second table as secondary index of counters in certain conditions it can have good performance (if you don't blast the counters with simultaneous reads and writes, or you fall in heavy counter r/w because of hotspotting).

Nevertheless Map/Reduce is definitively a more robust solution as @solomon-duskis proposed.

Upvotes: 0

Solomon Duskis
Solomon Duskis

Reputation: 2711

Getting a count on a large table requires reading something from every single row in Bigtable. There isn't a notion of just getting a single value that represents a count.

This type of problem requires something like a map/reduce, unfortunately. Fortunately, it's quite straight forward to do count with Dataflow.

Upvotes: 4

Related Questions