T. Webster
T. Webster

Reputation: 10099

MapReduce input from HTable

I have a MapReduce job with input to be sourced from HTable. From Java MapReduce code, how do you set the Job inputformat to the HBase TableInputFormat?

Is there anything like a JDBC connection to connect to the HTable database?

Upvotes: 0

Views: 600

Answers (2)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

HBase comes with a TableMapResudeUtil class to make it easy setting up map/reduce jobs Here's the first sample from the manual:

Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs
...

TableMapReduceUtil.initTableMapperJob(
  tableName,        // input HBase table name
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper
  null,             // mapper output key
  null,             // mapper output value
  job);
job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't emitting anything from mapper

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

Upvotes: 1

Tariq
Tariq

Reputation: 34184

If your client and HBase are running on the same machine you don't need to configure anything for your client to talk to HBase. Just create an HBaseConfiguration instance and connect to your HTable :

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "TABLE_NAME");

But if your client is running on a remote machine it relies on ZooKeeper in oreder to talk to your HBase cluster. Thus clients require the location of the ZooKeeper ensemble before they can proceed. This is how we normally configure our clients in order to make them connect to a HBase cluster :

Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "ZK_MACHINE_IP/HOSTNAME");
conf.set("hbase.zookeeper.property.clientPort","2181");
HTable table = new HTable(conf, "TABLE_NAME");

This is how you do it through Java API. HBase supports some other APIs as well. You can find more on this here.

Coming to your first question, if you need to use TableInputFormat as the InputFormat in your MR job you do it through the Job object, like this :

job.setInputFormatClass(TableInputFormat.class);

Hope this answers your question.

Upvotes: 1

Related Questions