Jerry Ragland
Jerry Ragland

Reputation: 621

Reading Key from SequenceFileAsBinaryInputFormat

I am trying to read a SequenceFile in my MapReduce program with input format for the Mapper as SequenceFileAsBinaryInputFormat. The sequence file has a IntWritable as key and ArrayWritable as value.

job.setInputFormatClass(SequenceFileAsBinaryInputFormat.class);

Mapper gets BytesWritable as key and value.

public void map(BytesWritable key, BytesWritable value, Context context)

Now I am trying to convert the key back to IntWritable but I am getting a NumberFormatException. Looks like I am doing something fundamentally wrong.

new IntWritable(Integer.parseInt(new String(key.getBytes())));

Upvotes: 0

Views: 921

Answers (1)

Simplefish
Simplefish

Reputation: 1130

The BytesWritable class exposes the raw binary representation of the data (what ever type it happens to be). If you were storing numbers, then the raw binary representation is determined by what the serialization class of the number was. It's almost certainly not going to look like nice text like "123" which is what parseInt is expecting. More likely it's some chunk of bytes like 1A34E56C... etc depending on the output serialization format.

If your data is actually text, you're probably better off with just TextInputFormat. On the other hand, if you know what the data type of your file is, then just SequenceFile is better. SequenceFileAsBinaryInputFormat is good for when you need to access the raw representation of the data on disk (e.g., if you're missing a class to deserialize the data, and need to provide one yourself).

Upvotes: 1

Related Questions