Reputation: 1680
I am very new for Hbase and Map Reduce API.
I am very confused with Map Reduce concepts. I need to Load text file into Hbase table using MAPReduce API. I googled some Examples but in that I can find MAPPER () not reducer method. I am confused with when to use mapper and when to use Reducer ().
I am thinking in the way like :
I really thank full for certain help
Upvotes: 4
Views: 12918
Reputation: 1504
Using HFileOutputFormat with CompleteBulkLoad is best and fastest way to load data in HBase. You will find sample code here
Upvotes: 1
Reputation: 11576
With regard to your questions:
Generally, will be your Reducer task which will write results (to the filesystem or to HBase), but the Mapper can do that too. There are MapReduce jobs which don't require a Reducer. With regard to reading from HBase, it's the Mapper class that has the configuration from which table to read from. But there's nothing related a Mapper is a reader and Reducer a writer. This article "HBase MapReduce Examples" provides good examples about how to read from and write into HBase using MapReduce.
In any case, if what you need is to bulk import some .csv files into HBase, you don't really need to do it with a MapReduce job. You can do it directly with the HBase API. In pseudocode:
table = hbase.createTable(tablename, fields);
foreach (File file: dir) {
content = readfile(file);
hbase.insert(table, content);
}
I wrote an importer of .mbox files into HBase. Take a look at the code, it may give you some ideas.
Once your data is imported into HBase, then you do need to code a MapReduce job to operate with that data.
Upvotes: 5
Reputation: 6424
Here are a couple responses of mine that address loading data into HBASE.
What is the fastest way to bulk load data into HBase programmatically?
Writing to HBase in MapReduce using MultipleOutputs
EDIT: Adding additional link based on comment
This link might help make the file available for processing.
Import external libraries in an Hadoop MapReduce script
Upvotes: 0