Jaison Thomas
Jaison Thomas

Reputation: 31

HBase Inner join and coprocessors

I am planning to do a project for implementing all aggregation operations in HBase. But I don’t know about its difficulty. I have only 6 months for completing that project. Should I go forward with it? I am planning to do it in java. I know that there are already some aggregation functions. But there in no INNER JOIN like queries now. I am planning to implement such type of queries. I don't know it’s a blunder or bluff.

Upvotes: 0

Views: 741

Answers (2)

David Gruzman
David Gruzman

Reputation: 8088

I think technically we should distinguish two types of joins:
a) One small table + One Big Table. By small table I mean table which can be cached in memory of each node w/o seriously affecting cluster operation. In this case Join using coprocessor should be be possible by putting small table in the hash map, iterating over the node local part of the data of the big table and this way producing join results. In the Hive's term it is called "map" join http://www.facebook.com/note.php?note_id=470667928919.
b) Two big tables. I do not think it is viable to get it production quality in short time frame. I might state that such functionality is realm of MPP databases and serious part of their IP.

Upvotes: 1

Paul M
Paul M

Reputation: 2046

It is definitely harder in HBase than doing it in an RDBMS or a different Hadoop technology like PIG or Hive.

Upvotes: 0

Related Questions