Reputation: 125

What are the main differences between KeyValueTextInputFormat and TextInputFormat in hadoop?

Can somebody give me one practical scenario where we have to use KeyValueTextInputFormat and TextInputFormat??

Upvotes: 2

Answers (2)

suresiva

Reputation: 3173

The TextInputFormat class converts every row of the source file into key/value types where the BytesWritable key represents the offset of the record and the Text value represents the entire record itself.

The KeyValueTextInputFormat is an extended version of TextInputFormat , which is useful when we have to fetch every source record as Text/Text pair where the key/value were populated from the record by splitting the record with a fixed delimiter.

Consider the Below file contents,

AL#Alabama
AR#Arkansas
FL#Florida

If TextInputFormat is configured , you might see the key/value pairs as,

0    AL#Alabama
14   AR#Arkansas
23   FL#Florida

if KeyvalueTextInputFormat is configured with conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "#") , you might see the results as,

AL    Alabama
AR    Arkansas
FL    Florida

Upvotes: 7

Venkat

Reputation: 1810

keyvaluetextinputformat lets you specify the key from the input file where as textinputfileformat has a fixed key which is the byte offset.

Set the separator for keyvaluetextinputformat using :

    Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

E.g. of place where you can use keyvaluetextinputformat is :

You get a file which is comma/ some byte separated and you know the first column can act as the key. Lets says CSV of salary with first column as Name/ Employee Id & second column as salary.

Also refer to this post : How to specify KeyValueTextInputFormat Separator

Upvotes: 0

What are the main differences between KeyValueTextInputFormat and TextInputFormat in hadoop?

Answers (2)

Related Questions