Reputation: 125
Can somebody give me one practical scenario where we have to use KeyValueTextInputFormat
and TextInputFormat
??
Upvotes: 2
Views: 3695
Reputation: 3173
The TextInputFormat
class converts every row of the source file into key/value types where the BytesWritable key represents the offset of the record and the Text value represents the entire record itself.
The KeyValueTextInputFormat
is an extended version of TextInputFormat , which is useful when we have to fetch every source record as Text/Text pair where the key/value were populated from the record by splitting the record with a fixed delimiter.
Consider the Below file contents,
AL#Alabama
AR#Arkansas
FL#Florida
If TextInputFormat
is configured , you might see the key/value pairs as,
0 AL#Alabama
14 AR#Arkansas
23 FL#Florida
if KeyvalueTextInputFormat
is configured with conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "#")
, you might see the results as,
AL Alabama
AR Arkansas
FL Florida
Upvotes: 7
Reputation: 1810
keyvaluetextinputformat lets you specify the key from the input file where as textinputfileformat has a fixed key which is the byte offset.
Set the separator for keyvaluetextinputformat using :
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");
E.g. of place where you can use keyvaluetextinputformat is :
You get a file which is comma/ some byte separated and you know the first column can act as the key. Lets says CSV of salary with first column as Name/ Employee Id & second column as salary.
Also refer to this post : How to specify KeyValueTextInputFormat Separator
Upvotes: 0