AST
AST

Reputation: 211

How does Hadoop MapReduce WordCount take input as <key, value> pairs?

How does the WordCount MapReduce application take input as a set of <key, value> pairs? It seems like it instead takes an input set of words.

From the Apache Hadoop MapReduce Tutorial:

  1. "The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs..."

  2. "(input) <k1, v1> -> map"

Upvotes: 1

Views: 517

Answers (1)

Phani Rahul
Phani Rahul

Reputation: 860

This tutorial hasn't gone into the details yet. There is an InputFormat and an OutputFormat that is defined for every MapReduce program.

An InputFormat defines what the key and a value is for a given record.

A RecordReader defines what a record is from a given input file.(there is a little more to this)

In the WordCount program, the default InputFormat is TextInputFormat, which takes in LongWritable as the key and Text as the value for every record; And every record in this program is a line(by default). The key here is the byte offset of the line and the value is the line of text. I think you have missed this part from the tutorial.

Upvotes: 2

Related Questions