How does Hadoop MapReduce WordCount take input as pairs?

Question

How does the WordCount MapReduce application take input as a set of pairs? It seems like it instead takes an input set of words.

From the Apache Hadoop MapReduce Tutorial:

"The MapReduce framework operates exclusively on pairs, that is, the framework views the input to the job as a set of pairs..."
"(input) -> map"

Phani Rahul · Accepted Answer

This tutorial hasn't gone into the details yet. There is an InputFormat and an OutputFormat that is defined for every MapReduce program.

An InputFormat defines what the key and a value is for a given record.

A RecordReader defines what a record is from a given input file.(there is a little more to this)

In the WordCount program, the default InputFormat is TextInputFormat, which takes in LongWritable as the key and Text as the value for every record; And every record in this program is a line(by default). The key here is the byte offset of the line and the value is the line of text. I think you have missed this part from the tutorial.

How does Hadoop MapReduce WordCount take input as <key, value> pairs?

Answers (1)

Related Questions

How does Hadoop MapReduce WordCount take input as &lt;key, value&gt; pairs?

Answers (1)

Related Questions

How does Hadoop MapReduce WordCount take input as <key, value> pairs?