Why HBase rows are said to be stored as lexicographically sorted?

Question

Based on the HBase documentation, again following the reference from the Google BigTable paper, the rows are said to be stored with lexicographic sorting of the row key.

It is evident that the rows are sorted lexicographically when we have a string in the rowkey or if we convert a string to byte array and store it. For that matter of fact, even if you convert an integer as a string and then to byte array, it makes sense. E.g.: The below hbase shell takes the number as string and stores it

create 'test', 'cf'
put 'test', '1', 'cf:c1', 'xyz1'
put 'test', '2', 'cf:c1', 'xyz2'
put 'test', '11', 'cf:c1', 'xyz11'

scan 'test3'
ROW                                         COLUMN+CELL
 1                                          column=cf:c1, timestamp=1589736288540, value=xyz1
 11                                         column=cf:c1, timestamp=1589736311607, value=xyz11
 2                                          column=cf:c1, timestamp=1589736301167, value=xyz2
3 row(s) in 0.0080 seconds

On the other hand, I can convert the number to byte array in a programmatic way using the HBase client utilities (org.apache.hadoop.hbase.util.Bytes, which uses Big Endian stuffs..) and I see that the rows are naturally sorted, not in the lexicographic way. For the similar data and table above, I used the below code to Put the data to to HBase Table.

val put = new Put(Bytes.toBytes(11L))
put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("c1"), Bytes.toBytes("abc"))
table.put(put)

The scan result is

hbase(main):014:0> scan 'test2'
ROW                                        COLUMN+CELL
 \x01                                      column=cf:a, timestamp=1589727058289, value=abc \1
 \x02                                      column=cf:a, timestamp=1589727099714, value=abc \2
 \x0B                                      column=cf:a, timestamp=1589727147449, value=abc \11
 {                                         column=cf:a, timestamp=1589733907127, value=abc \123
 \xF8                                      column=cf:a, timestamp=1589733854179, value=abc \112312312L
5 row(s) in 0.0080 seconds

My question is -
Is it a pure coincidence that the lexicographic ordering of the byte arrays generated from integers is same as that of natural ordering or the way we are converting the long to byte array is actually padding with some values to get the effective natural ordering?
If not, in order to handle the non-typed row keys, are we saying that the row keys are sorted in lexicographic way, so that when you mix and match with string and other data types, the sorting has a predetermined order? In the latter case, in my opinion, it is not true that the row keys are sorted in strictly lexicographic order, because just to fulfill our needs of having non-typed columns(row keys here), it is built so..!

Basically, is the byte encoding here -> Bytes.toBytes(long) preserving the natural ordering of Long? That's to say, will the lexicographic ordering of Array[Byte] that the function returns be the same as the natural ordering of the Long taken as input?

Diego Sevilla · Accepted Answer

The answer to your question would be yes. But be careful if you mix different key sizes. If you for example use all the keys of the same size, and all generated with Bytes.toBytes(long), they will be maintaining natural long order. That won't be the case if you mix different sizes of arrays of bytes, because, as you show, one byte '1' will be around two bytes '11', for example.

In the case of toBytes(), it uses a fixed length big endian encoding. Say that you use four bytes, then the ordering would be like:

00 00 00 00 (long value 0)
00 00 00 01 (long value 1)
00 00 00 02
...
00 00 01 00 (long value 256)
...

which will make the same ordering in natural numbers and key generations.

Why HBase rows are said to be stored as lexicographically sorted?

Answers (1)

Related Questions