Reputation: 77
I need to insert 400 million rows into a HBase table.
Schema looks something like this
where I am generating key by simply concatenating int and int and value as System.nanoTime()
my mapper looks something like this
public class DatasetMapper extends Tablemapper <Text,LongWritable> {
private static Configuration conf = HBaseConfiguration.create();
public void map (Text key, LongWritable values, Context context) throws exception {
// instantiate HTable object that connects to table name
HTable htable = new HTable(conf,"temp") // already created temp table
htable.setAutoFlush(flase);
htable.setWriteBufferSize(1024*1024*12);
// construct key
int i = 0, j = 0;
for(i=0; i<400000000,i++) {
String rowkey = Integer.toString(i).concat(Integer.toString(j));
Long value = Math.abs(System.nanoTime());
Put put = new Put(Bytes.toBytes(rowkey));
put.add(Bytes.toBytes("location"),Bytes.toBytes("longlat"),Bytes.toBytes(value);
htable.put(put)
j++;
htable.flushCommits();
}
}
and my job looks like this
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"initdb");
job.setJarByClass(DatasetMapper.class); // class that contains mapper
TableMapReduceUtil.initTableMapperJob(
null, // input table
null,
DatabaseMapper.class, // mapper class
null, // mapper output key
null, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
temp, // output table
null, // reducer class
job);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
The job runs but inserts 0 records. I know I am making some mistake but I am not able to catch it as I am new to HBase. Please help me.
thanks
Upvotes: 1
Views: 3606
Reputation: 34184
First things first, name of your mapper is DatasetMapper but in your job config you have specified DatabaseMapper. I am wondering how it is working without any error.
Next, it looks like you have mixed the TableMapper and Mapper usage together. Hbase TableMapper is an abstract class which extends Hadoop Mapper and helps us to read from HBase conveniently and TableReducer helps in writing back to HBase. You are trying to put data from your Mapper and you are using TableReducer at the same time. You mapper will actually never get called.
Either use TableReducer to put the data or use just Mapper. If you really wish to do it in your Mapper you can use TableOutputFormat class. See the example given at Page 301 of HBase Definitive Guide. This is the Google Books link
HTH
P.S. : You might find these links helpful in learning HBase+MR integration properly :
Upvotes: 3