Navyah
Navyah

Reputation: 1680

Load data into Hbase table using HBASE MAP REDUCE API

I am very new for Hbase and Map Reduce API.
I am very confused with Map Reduce concepts. I need to Load text file into Hbase table using MAPReduce API. I googled some Examples but in that I can find MAPPER () not reducer method. I am confused with when to use mapper and when to use Reducer ().

I am thinking in the way like :

  1. TO write data to a Hbase we use mapper
  2. TO read data from HBASE we use mapper and reducer(). please can any one clear me with detail explanation.
  3. I am trying to load data from text file into HBASE table. I googled and tried some code but i dont know, how to load the text file and read in HBASE mapreduce API.

I really thank full for certain help

Upvotes: 4

Views: 12918

Answers (3)

Prasad D
Prasad D

Reputation: 1504

Using HFileOutputFormat with CompleteBulkLoad is best and fastest way to load data in HBase. You will find sample code here

Upvotes: 1

Diego Pino
Diego Pino

Reputation: 11576

With regard to your questions:

  • The Mapper receives splits of data and returns a pair key, set<values>
  • The Reducer receives the output of from the Mapper and generates a pair <key, value>

Generally, will be your Reducer task which will write results (to the filesystem or to HBase), but the Mapper can do that too. There are MapReduce jobs which don't require a Reducer. With regard to reading from HBase, it's the Mapper class that has the configuration from which table to read from. But there's nothing related a Mapper is a reader and Reducer a writer. This article "HBase MapReduce Examples" provides good examples about how to read from and write into HBase using MapReduce.

In any case, if what you need is to bulk import some .csv files into HBase, you don't really need to do it with a MapReduce job. You can do it directly with the HBase API. In pseudocode:

table = hbase.createTable(tablename, fields); 
foreach (File file: dir) {
   content = readfile(file);    
   hbase.insert(table, content); 
}

I wrote an importer of .mbox files into HBase. Take a look at the code, it may give you some ideas.

Once your data is imported into HBase, then you do need to code a MapReduce job to operate with that data.

Upvotes: 5

QuinnG
QuinnG

Reputation: 6424

Here are a couple responses of mine that address loading data into HBASE.

What is the fastest way to bulk load data into HBase programmatically?

Writing to HBase in MapReduce using MultipleOutputs

EDIT: Adding additional link based on comment This link might help make the file available for processing.
Import external libraries in an Hadoop MapReduce script

Upvotes: 0

Related Questions