Reputation: 5397
I was trying to insert the data in hbase using the following commands:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,f:pageviews,f:visit -Dimporttsv.separator=\001 -Dimporttsv.bulk.output=output modelvar /000000.gz
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles modelvar output
where modelvar
is the final hbase table in which data is suppose to be stored. output
is the HDFS path where Hfiles are stored. Now the problem is the data which i was trying to insert is the output of hive. So, default separator would be \001
which i can't change. So, I kept the -Dimporttsv.separator=
value to be \001
. But, we can't keep the multiple character as a separator. So, how do i insert the data in hbase
written by hive
Upvotes: 3
Views: 2701
Reputation: 11
Finally found an answer, replace the separator as $(echo -e "\002")
. This works for all shell commands.
Upvotes: 1
Reputation: 161
IMO, you can't use byte in the hadoop configuration.
But we can notice the property 'importtsv.separator', defined by org.apache.hadoop.hbase.mapreduce.ImportTsv.SEPARATOR_CONF_KEY is Base64 encode in org.apache.hadoop.hbase.mapreduce.ImportTsv:245
public static Job createSubmittableJob(Configuration conf, String[] args)
throws IOException, ClassNotFoundException {
// Support non-XML supported characters
// by re-encoding the passed separator as a Base64 string.
String actualSeparator = conf.get(SEPARATOR_CONF_KEY);
if (actualSeparator != null) {
conf.set(SEPARATOR_CONF_KEY,
Base64.encodeBytes(actualSeparator.getBytes()));
}
...
}
Is decoded in org.apache.hadoop.hbase.mapreduce.ImportTsv:92
protected void doSetup(Context context) {
Configuration conf = context.getConfiguration();
// If a custom separator has been used,
// decode it back from Base64 encoding.
separator = conf.get(ImportTsv.SEPARATOR_CONF_KEY);
if (separator == null) {
separator = ImportTsv.DEFAULT_SEPARATOR;
} else {
separator = new String(Base64.decode(separator));
}
...
}
And finally is checked to be a single byte in org.apache.hadoop.hbase.mapreduce.ImportTsv:97
public TsvParser(String columnsSpecification, String separatorStr) {
// Configure separator
byte[] separator = Bytes.toBytes(separatorStr);
Preconditions.checkArgument(separator.length == 1,
"TsvParser only supports single-byte separators");
separatorByte = separator[0];
...
}
As a solution, I suggest you redeclare a main method which modify the configuration's properties before the execution.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.mapreduce.ImportTsv;
public class ImportTsvByteSeparator extends ImportTsv
{
/**
* Main entry point.
*
* @param args The command line parameters.
* @throws Exception When running the job fails.
*/
public static void main(String[] args) throws Exception {
// We just have to modify the configuration
Configuration conf = HBaseConfiguration.create();
int byteSeparator = conf.getInt("importtsv.byte_separator", 001);
String separator = Character.toString((char) byteSeparator);
conf.set("importtsv.separator", separator);
// Now we call ImportTsv main's method
ImportTsv.main(args);
}
}
I don't think we can overwrite some method inside the process (as createSubmittableJob()) because of the visibility of the attributes.
Upvotes: 1