Reputation: 251
We have created tables in HBase and those tables are mapped to Hive using HBase Storage Handler. If the tables are having huge records say 100Millions, and if we need to join the two tables based on some column and if those columns are not row key id column, how will be the performance, is there any way that we can increase the performance in table joins for Hive tables mapped to HBase.
Regards, GHK.
Upvotes: 3
Views: 1931
Reputation: 1401
It doesn't matter which underlying storage are you using for Hive JOIN performance. So HBase rowkey doesn't helps you out with Hive.
One trick which you can use is mapjoin which works very well if you are joining small table with huge one.
You can read more about Hive JOIN performance on this link https://www.facebook.com/notes/facebook-engineering/join-optimization-in-apache-hive/470667928919
Upvotes: 2