CodeReaper
CodeReaper

Reputation: 387

Hadoop Input Splits and Record Reader

Read this on apache documentation:

InputSplit represents the data to be processed by an individual Mapper.

Typically, it presents a byte-oriented view on the input and is the responsibility of RecordReader of the job to process this and present a record-oriented view.

Link - https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/mapred/InputSplit.html

Can somebody explain the difference between byte-oriented view and record-oriented view?

Upvotes: 3

Views: 1350

Answers (1)

Thanga
Thanga

Reputation: 8091

HDFS splits its blocks (byte-oriented view) so that each block is less than or equal to the block size configured. So it is considered to be not following a logical split. Means a part of last record may reside in one block and rest of it is in another block. This seems correct for storage. But At processing time, the partial records in a block cannot be processed as it is. So the record-oriented view comes into place. This will ensure to get the remaining part of the last record in the other block to make it a block of complete records. This is called input-split (record oriented view).

Upvotes: 4

Related Questions