dmytrivv
dmytrivv

Reputation: 608

Hadoop mapper reading from 2 different source input files

I have tool which chains a lot of Mappers & Reducers, and at some point I need merge results from previous map-reduce steps, for example as input I have two files with data:

/input/a.txt
apple,10
orange,20

*/input/b.txt*
apple;5
orange;40

result should be c.txt, where c.value = a.value * b.value

/output/c.txt
apple,50   // 10 * 5
orange,800 // 40 * 20

How it could be done? I've resolved this with introducing simple Key => MyMapWritable (type=1,2, value), and merging (actually, multiplying) data in reducers. It works, but:

  1. have feeling that it could be done easier (smells not good)
  2. is it possible somehow to know inside Mapper which exactly file was used as record provider (a.txt or b.txt). For now, I just used different separators: coma & semicolon :(

Upvotes: 3

Views: 6976

Answers (2)

Ashish
Ashish

Reputation: 5791

String fileName = ((FileSplit) context.getInputSplit()).getPath()
                .toString();

if (fileName.contains("file_1")) {
   //TODO for file 1
} else {
   //TODO for file 2
}

Upvotes: 1

Chris White
Chris White

Reputation: 30089

Assuming they have been partitioned and sorted in the same way, then you can use the CompositeInputFormat to perform a map-side-join. There's an article on using it here. I don't think it's been ported to the new mapreduce api though.

Secondly, you can get the input file in the mapper by calling context.getInputSplit(), this will return the InputSplit, which if you're using TextInputFormat, you can cast to a FileInputSplit and then call getPath() to get the file name. I don't think you can use this method with CompositeInputFormat though as you won't know where the Writables in the TupleWritable have come from.

Upvotes: 3

Related Questions