Reputation: 2153
In https://cloud.google.com/bigtable/docs/schema-design it is clearly described how to choose the row key of a table. But I could not find any info on how to compose this row key. Where and by what means it is composed?
Upvotes: 3
Views: 7097
Reputation: 8119
Composing the row key is an essential part of your data model design in BT. When you design your table, you need to know in advance how the data will be fetched in order to optimize your use of BT.
BT is best at fetching a single row by its rowkey. So if you can design your table (and application's data model) to always fetch by a specific key, that's awesome.
For example, every user will have its userid (say "1234") as its rowkey. AND you always fetch by userid ('fetch userid "1234"'). simple.
But things are not always this easy. In some data models, the data you need is dispersed across multiple rows. This is less ideal than the previous case, but can still work. This fetching pattern is called "scanning" - and BT supports it, but with an important caveat: it can only fetch all the rows that share a common prefix. In addition, scanning is also slower when compared to fetching-by-rowkey.
In the context of the example above, scanning would mean 'getting all the rows with userid that begins with "123"'. Of course, this doesn't make a lot of sense: you seldom need to pull userids by a shared prefix. But in some data models it might make sense to use scans. For example, if your rowkey is a timestamp, and you want to fetch all the items between two timestamps.
You can take this to the next level by composing your rowkey from different fields - and thus allowing for more interesting scans:
If you data model calls for fetches by user-id AND a timestamp range, you can compose the rowkey thusly: "userid-timestmap" and then fetch all the rows that begin at "1234-(start-timestmap)" and end at "1234-(end-timestamp)". I believe this is what is meant by "composing" the row-key.
(another thing to keep in mind, as mentioned by another answer here, is that you want to avoid creating "hotspots" on your table: rows that are popular should not be stored next to each other in order to optimize parallel updates and fetches. Because rowkeys are stored in BT in lexicographic order, one way to avoid hotspots is to choose or compose rowkeys in such a way that the popular ones are not lexicographically near to each other).
Upvotes: 6
Reputation: 26
Here's a Python code sample that creates a sample row key and adds it to a list of keys returned from a CBT table:
SampleRowKey = namedtuple("SampleRowKey", "row_key offset_bytes")
keys.insert(0, SampleRowKey(b'', 0))
Hope this helps.
Upvotes: 0
Reputation: 92
Row keys are simple byte strings that are sorted to create a full table. Rows can be composed of multiple parts concatenated together, considering what parts belong to the row and in what order, as that corresponds to how much content will go into the row (There is a limit of 100MB per row) and how efficient it will be to scan contiguous rows.
The Cloud Bigtable documentation for schema design is here: https://cloud.google.com/bigtable/docs/schema-design
The same concepts apply for HBase and similar databases: https://mapr.com/blog/guidelines-hbase-schema-design/
Upvotes: 1
Reputation: 581
I'm not sure I understand your question, but I'll try to shed some light on row keys in general. Unlike SQL tables, you don't need to create a primary key column, Bigtable tables already have a concept of a primary key built in. You just need to decide what you want to store in it. Implementation wise, Bigtable doesn't try to interpret the keys and treats them as a byte array.
The values on the other hand, need at least one column family created before inserting data. You can create column families using the cbt
command line tool. You can find the instructions how to install it here:
https://cloud.google.com/bigtable/docs/go/cbt-overview
And general information about managing tables here:
https://cloud.google.com/bigtable/docs/managing-tables.
Upvotes: 3