Ragit
Ragit

Reputation: 125

Hadoop..how to calculate the hash of input split

I want to calculate the sha256 hash of the input split and the output of mapper should be

(key, value) where key is the location of start of the block and value is the sha256 hash of the complete block..

My REQUIREMENT is to read the complete Input Split as One record.

here is what i have done so far.. (I have taken block size as 100 kb)..Have not worked on the Value part of the key,value. Right not just outputting 1

public void map(LongWritable key, Text value,
      OutputCollector <LongWritable, IntWritable> output, Reporter reporter) throws IOException {

     LongWritable key_offset = new LongWritable();
     String line = value.toString();
     long block = 0;
     if (count == 0) {
       key_offset = key;
       block = key_offset.get();
       block = block / 100000;
       count++;
     }
     output.collect(new LongWritable(block), one);

Upvotes: 2

Views: 943

Answers (1)

Chris White
Chris White

Reputation: 30089

Can you amend the WholeFileInputFormat from the Hadoop - The definitive guide so that rather than passing the entire file contents as a BytesWritable value, you calculate the SHA256 and pass that as the value? You should just need to amend the WholeFileRecordReader.next() method replacing the IOUtils.readFully with some method for calculating the SHA256 of the file bytes - maybe something like:

Upvotes: 0

Related Questions