Reputation: 1033
Is it possible to get paragraphs of text passed to a Mapper class instead of line by line. I am looking for a ParagraphRecordReader implementation.
Upvotes: 4
Views: 474
Reputation: 1494
The answer at https://stackoverflow.com/a/5398215/1660002 sort of answers this requirement. However, you can simply also set the configuration parameter textinputformat.record.delimiter
to a double newline string (For example: "\n\n"
) to solve this.
This configurable feature is available in the Apache Hadoop 0.23.x, and 2.x releases, and also in both CDH3 and CDH4 releases from Cloudera if you use those.
Upvotes: 1