Reputation: 6637
When processing text file how does hadoop identify records ? Is it based on newline characters or full stops ?
If I have a text file list of 5000 words, all on single line, separated by space; no new line characters, commas or full stops. How will RecordReader behave ?
e.g. abc pqr xyz lmn qwe rew poio kjkh ascd lkyg ......
Upvotes: 1
Views: 99
Reputation: 20969
You can set the delimiter in the config with textinputformat.record.delimiter
.
If it isn't supplied it will fallback to split the lines based on one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF)
.
So your example line will be read as a single record.
You can read through the code of the LineReader, TextInputFormat and LineRecordReader for more details.
Upvotes: 1