Reputation: 279
I want to insert N rows to HBase table from each mapper in batch. I am currenly aware of two ways of doing this:
put(List<Put> puts)
method of HTable instance and also make sure to disable autoFlush
parameter.context.write(rowKey, put)
method.Which one is better?
In 1st way, context.write()
is not required since hTable.put(putsList)
method is used to directly put data in table. My mapper class is extending Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
, so what classes should I use for KEYOUT
and VALUEOUT
?
In 2nd way, I have to call context.write(rowKey, put)
N times. Is there any way by which I can use context.write()
for list of Put
operations?
Is there any other way of doing this with MapReduce?
Thanks in advance.
Upvotes: 2
Views: 1668
Reputation: 29227
I prefer second option where batching is natural(no need for list of puts) for mapreduce.... to have deep insight please see my second point
1) Your first option List<Put>
is generally used for Standalone Hbase Java client. Internally it is controlled by hbase.client.write.buffer
like below in one of your config xmls
<property>
<name>hbase.client.write.buffer</name>
<value>20971520</value> // around 2 mb i guess
</property>
which has default value say 2mb size. once you buffer is filled then it will flush all puts to actually insert in to your table. which is same way as BufferedMutator as explained in #2
2) Regarding second option, if you see TableOutputFormat documentation
org.apache.hadoop.hbase.mapreduce
Class TableOutputFormat<KEY>
java.lang.Object
org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation>
org.apache.hadoop.hbase.mapreduce.TableOutputFormat<KEY>
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class TableOutputFormat<KEY>
extends org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation>
implements org.apache.hadoop.conf.Configurable
Convert Map/Reduce output and write it to an HBase table. The KEY is ignored
while the output value must be either a Put or a Delete instance.
-- Other way of seeing this through code is like below.
/**
* Writes a key/value pair into the table.
*
* @param key The key.
* @param value The value.
* @throws IOException When writing fails.
* @see RecordWriter#write(Object, Object)
*/
@Override
public void write(KEY key, Mutation value)
throws IOException {
if (!(value instanceof Put) && !(value instanceof Delete)) {
throw new IOException("Pass a Delete or a Put");
}
mutator.mutate(value);
}
}
conclusion : context.write(rowkey,putlist) It is not possible with API.
However, BufferedMutator ( from mutator.mutate in above code) says
so your batching is natural(with BufferedMutator) as aforementioned
Upvotes: 1