Shashank.Kr
Shashank.Kr

Reputation: 77

Bulk Insert Data into HBase using MapReduce

I need to insert 400 million rows into a HBase table.

Schema looks something like this

where I am generating key by simply concatenating int and int and value as System.nanoTime()

my mapper looks something like this

public class DatasetMapper extends Tablemapper <Text,LongWritable> {


  private static Configuration conf = HBaseConfiguration.create();


public void map (Text key, LongWritable values, Context context) throws exception {

   // instantiate HTable object that connects to table name 
   HTable htable = new HTable(conf,"temp") // already created temp table 
   htable.setAutoFlush(flase);
   htable.setWriteBufferSize(1024*1024*12);

   // construct key
   int i = 0, j = 0;
   for(i=0; i<400000000,i++) {
       String rowkey = Integer.toString(i).concat(Integer.toString(j));
       Long value = Math.abs(System.nanoTime());
       Put put = new Put(Bytes.toBytes(rowkey));
           put.add(Bytes.toBytes("location"),Bytes.toBytes("longlat"),Bytes.toBytes(value);
       htable.put(put)
       j++;
       htable.flushCommits();
}
}

and my job looks like this

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"initdb");
job.setJarByClass(DatasetMapper.class);    // class that contains mapper

TableMapReduceUtil.initTableMapperJob(
null,      // input table
null,            
DatabaseMapper.class,   // mapper class
null,             // mapper output key
null,             // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
temp,      // output table
null,             // reducer class
job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}

The job runs but inserts 0 records. I know I am making some mistake but I am not able to catch it as I am new to HBase. Please help me.

thanks

Upvotes: 1

Views: 3606

Answers (1)

Tariq
Tariq

Reputation: 34184

First things first, name of your mapper is DatasetMapper but in your job config you have specified DatabaseMapper. I am wondering how it is working without any error.

Next, it looks like you have mixed the TableMapper and Mapper usage together. Hbase TableMapper is an abstract class which extends Hadoop Mapper and helps us to read from HBase conveniently and TableReducer helps in writing back to HBase. You are trying to put data from your Mapper and you are using TableReducer at the same time. You mapper will actually never get called.

Either use TableReducer to put the data or use just Mapper. If you really wish to do it in your Mapper you can use TableOutputFormat class. See the example given at Page 301 of HBase Definitive Guide. This is the Google Books link

HTH

P.S. : You might find these links helpful in learning HBase+MR integration properly :

Link 1.

Link 2.

Upvotes: 3

Related Questions