GHK
GHK

Reputation: 251

How To Create Partitions In Hbase Table As Like Hive Table Partitions

We are planning to migrate from CDH3 to CDH4, as part of this migration we also planning to bring HBASE into out system because it also updates to the data, in CDH3 we are using Hive as warehouse.

Here we are having the major problem in migration, Hive supports partitions to tables. And our system has many tables in different schemas and some tables has partitions base on date, we have the history of data from last 5 years (365 * 5 partitions exists in some tables).

We want to achieve the same behavior in HBase also, when I browsed I couldnt find the solution for creating partitions in HBase. Can any one help me in implementing this partition wised table creation in HBase.

The reason we are going for HBASE is, it supports updates.

If HBASE is not supporting this which is other (like MangoDB, Cassandra) supports our behavior.

Its really great help if we can find at least some work around solutions also.

Upvotes: 5

Views: 13191

Answers (2)

Tariq
Tariq

Reputation: 34184

I'm afraid you can't partition data in HBase like you do in Hive. Both these tools are quite different from each other both in design and behavior. Data in HBase is kinda already partitioned for you, since HBase partitions the key space and each partition is what we call a table. If you still need more fine grained partitioning, you could achieve that by using column families wisely.

For example, you could have a column family for each year. So, you would be having a table with 5 column families.


Edit :

If you need something like what you have mentioned in your last comment, you can create a pre-splitted table. You can choose the start and end rowkeys for the regions as per your convenience. Like, one partition for each day where the first and the last entries on that day will be the start row and end boundaries for that particular region respectively.

Upvotes: 0

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

HBase has a notion close to partition which is called a region. however These partitions in HBase don't work like Hive (or RDBMS) partitions. Each region holds a range of keys but you can break a key range into smaller regions by splitting or dividing it - e.g. if your original region holds keys 0-9 you can divide it to two smaller regions 0-4 and 5-9 or ten partitions 0,1,2... etc.

If your key would be composite so that the date would be the first part of it followed by whatever your key is today you can pre-split hbase so that each day would get one or more regions.

You should note, however, that a key where the most significant bytes are sequential will slow down your writes (may not be a problem if you're doing one-time loads) a problem called "hot spot" - you can read about it and a sample approach overcoming it in a blog post by Alex Baranau from sematext

Upvotes: 4

Related Questions