Reputation: 382
I have a table in hbase, whose row ids are of length 25 characters. I observed that if the row id length is less (around 10), then the reduce phase runs a little faster than having row id of 25 characters. So I thought of using the hashcode of this 25 character String as the row id. Is it ok to use the generated hascode as the row id in hbase table?
It is to be noted that String.hashcode() returns an integer (1000 million) and my table record count is around 200 million.
Upvotes: 0
Views: 374
Reputation: 690
hashcode() function in Java consumes instance data to produce a 32-bit interger. But most of the Classes override this function to provide better spread based on their class structure.
From Java 1.2, java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string.
But with this approach too , you always have a possibility of Collisions, which are very harmful in case of row index, hence should be avoided.
Upvotes: 0
Reputation: 34184
Although Hbase doesn't stop you from doing that, I don't think it would be a wise decision. There might be hahcode collision which will lead to improper inserts. In such a case 2 different records will go to the same row as different versions.
Upvotes: 2