Guo
Guo

Reputation: 1803

How to extract key,value pairs from hbase SequenceFile using mapreduce?

I used the Hbase Export utility tool to export a hbase table into HDFS as a SequenceFile.

And now I want to use a mapreduce job to process this file:

public class MapSequencefile {
        public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
            @Override
            protected void map(LongWritable key, Text value,
                    Mapper<LongWritable, Text, Text, Text>.Context context)
                    throws IOException, InterruptedException {
                System.out.println(key+"...."+value);
            }
        }

        public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {  

            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf , MapSequencefile.class.getSimpleName());

            job.setJarByClass(MapSequencefile.class);
            job.setNumReduceTasks(0);
            job.setMapperClass(MyMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            job.setInputFormatClass(SequenceFileInputFormat.class); //use SequenceFileInputFormat
            FileInputFormat.setInputPaths(job, "hdfs://192.16.31.10:8020/input/");
            FileOutputFormat.setOutputPath(job, new Path("hdfs://192.16.31.10:8020/out/"));
            job.waitForCompletion(true);
        }  
}

but it always throws this exception:

Caused by: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization.
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1964)
    at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
    at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)

What can i do to fix this error?

Upvotes: 1

Views: 1146

Answers (1)

Binary Nerd
Binary Nerd

Reputation: 13927

I assume you're using this to do the export:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

As described on this HBase page: http://hbase.apache.org/0.94/book/ops_mgt.html#export

Looking at the source code for org.apache.hadoop.hbase.mapreduce.Export you can see it sets:

job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);

Which aligns with your error (the Value is a Result object):

Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'

So your map signature needs to change to:

Mapper<ImmutableBytesWritable, Result, Text, Text>

And you'll need to include the correct HBase library in your project so it has access to:

org.apache.hadoop.hbase.client.Result

Upvotes: 1

Related Questions