Reputation: 229
I am new to Hadoop and have a question about the parameters: for the word count example, see below code snippet:
public static class TokenizerMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
.....
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
.......
}
}
I know the "value" parameter is the line read from file, but what does the "key" parameter mean? what does it correspond to?
Why it's type is LongWritable?
I wasted several hours on it by searching the doc, could anyone help?
Upvotes: 1
Views: 538
Reputation: 5538
The key is of type LongWritable
because the wordcount program takes the input as TextInputFormat
As per JavDoc for TextInputFormat
An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..
By definition suppose your text is
We are fine.
How are you?
All are fine.
Then Input to the mapper is
Key: 1
Value:We are fine.
Key: 14
Value:How are you?
(There are approx 13 characters in first line including newline, so line position is 14)
Key:28
Value:All are fine.
(There are approx 13 more characters in second line including newline, so line position since start of the file is 28)
Upvotes: 2