Reputation: 10099
I have a MapReduce job with input to be sourced from HTable. From Java MapReduce code, how do you set the Job inputformat to the HBase TableInputFormat?
Is there anything like a JDBC connection to connect to the HTable database?
Upvotes: 0
Views: 600
Reputation: 25909
HBase comes with a TableMapResudeUtil
class to make it easy setting up map/reduce jobs
Here's the first sample from the manual:
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
...
TableMapReduceUtil.initTableMapperJob(
tableName, // input HBase table name
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper
null, // mapper output key
null, // mapper output value
job);
job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
Upvotes: 1
Reputation: 34184
If your client and HBase are running on the same machine you don't need to configure anything for your client to talk to HBase. Just create an HBaseConfiguration instance and connect to your HTable :
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "TABLE_NAME");
But if your client is running on a remote machine it relies on ZooKeeper in oreder to talk to your HBase cluster. Thus clients require the location of the ZooKeeper ensemble before they can proceed. This is how we normally configure our clients in order to make them connect to a HBase cluster :
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "ZK_MACHINE_IP/HOSTNAME");
conf.set("hbase.zookeeper.property.clientPort","2181");
HTable table = new HTable(conf, "TABLE_NAME");
This is how you do it through Java API. HBase supports some other APIs as well. You can find more on this here.
Coming to your first question, if you need to use TableInputFormat as the InputFormat in your MR job you do it through the Job object, like this :
job.setInputFormatClass(TableInputFormat.class);
Hope this answers your question.
Upvotes: 1