Reputation: 211
How does the WordCount
MapReduce application take input as a set of <key, value> pairs? It seems like it instead takes an input set of words.
From the Apache Hadoop MapReduce Tutorial:
"The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs..."
"(input) <k1, v1> -> map"
Upvotes: 1
Views: 517
Reputation: 860
This tutorial hasn't gone into the details yet. There is an InputFormat
and an OutputFormat
that is defined for every MapReduce program.
An InputFormat
defines what the key and a value is for a given record.
A RecordReader
defines what a record is from a given input file.(there is a little more to this)
In the WordCount
program, the default InputFormat
is TextInputFormat, which takes in LongWritable
as the key and Text
as the value for every record; And every record in this program is a line(by default). The key here is the byte offset of the line and the value is the line of text. I think you have missed this part from the tutorial.
Upvotes: 2