orbital
orbital

Reputation: 1033

Paragraph processing for Hadoop

Is it possible to get paragraphs of text passed to a Mapper class instead of line by line. I am looking for a ParagraphRecordReader implementation.

Upvotes: 4

Views: 474

Answers (1)

Harsh J
Harsh J

Reputation: 1494

The answer at https://stackoverflow.com/a/5398215/1660002 sort of answers this requirement. However, you can simply also set the configuration parameter textinputformat.record.delimiter to a double newline string (For example: "\n\n") to solve this.

This configurable feature is available in the Apache Hadoop 0.23.x, and 2.x releases, and also in both CDH3 and CDH4 releases from Cloudera if you use those.

Upvotes: 1

Related Questions