learningtocode
learningtocode

Reputation: 765

How are keys and value inputs to map function defined for SequenceFileInputFormat?

I am trying to understand a example hadoop project. It has following code block

jconf.setOutputKeyClass(Text.class);
jconf.setOutputValueClass(Text.class);
jconf.setInputFormat(SequenceFileInputFormat.class);

From this link, I read that for SequenceFileInputFormat, key and value are user defined. Do I need to implement RecordReader for this? I don't see it implemented in the project. Are there any default delimiters that it uses to separate the input splits into key,value pairs?

Upvotes: 1

Views: 1255

Answers (2)

Chitra
Chitra

Reputation: 198

To your question, "Is there a default delimiter it uses to identify key?" I would think you need not worry about that. Essentially a sequence file consists of binary key/value pairs. You could use SequenceFile.Writer#append to write the key and value.

http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/io/SequenceFile.Writer.html#append%28java.lang.Object,%20java.lang.Object%29

Upvotes: 0

Eswara Reddy Adapa
Eswara Reddy Adapa

Reputation: 995

You do not have to implement a RecordReader to read a sequence file.

However, generating a sequence file is not as simple as generating a text file. All the commands such as

hadoop fs -put

generate text files in HDFS by default.

If you want to test an MR program that expects sequence file as input, you first need to convert your text file into sequence file and give that as input.

To create a sequence file from a text file, you can write a simple MR with identity mapper and no reducer.You need to set input file format as text and pass the text file as input;set the output format as sequence file in this job.The output of this job will be a copy of your text file in sequence file format.Make sure you choose the output key and value of this job keeping in mind that any subsequent MR job that uses the sequence file will have to accept them as its input key and value.In other words, key and value in a sequence file are decided at the time of its creation.

Any subsequent MR job(like the one you quoted in question) that expects a sequence file can use the above sequence file and the 'key' and 'value' type in mapper input will be same as what you emitted earlier.

Upvotes: 2

Related Questions