Kaushik Lele
Kaushik Lele

Reputation: 6637

How does hadoop RecordReader identify records

When processing text file how does hadoop identify records ? Is it based on newline characters or full stops ?

If I have a text file list of 5000 words, all on single line, separated by space; no new line characters, commas or full stops. How will RecordReader behave ?

e.g. abc pqr xyz lmn qwe rew poio kjkh ascd lkyg ......

Upvotes: 1

Views: 99

Answers (1)

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

You can set the delimiter in the config with textinputformat.record.delimiter.

If it isn't supplied it will fallback to split the lines based on one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF). So your example line will be read as a single record.

You can read through the code of the LineReader, TextInputFormat and LineRecordReader for more details.

Upvotes: 1

Related Questions