Cristian Perez
Cristian Perez

Reputation: 23

Intermediate files in mapper (Mapreduce)

so I'm new in this Hadoop world and I'm trying to understand how mappers and reducers work, my problem (and question) is:

I have a long mapper that before generating [key,value] pairs needs to generate some intermediate files. For example, in a whole mapper: files A and B as inputs, I need an output file E with some intermediate files that can't be reduced.

file A -> file C
file B -> file D
file C + file D -> file E

Is it possible to archieve this? Do intermediate files stay in the nodes?

Upvotes: 0

Views: 1082

Answers (1)

Venkat
Venkat

Reputation: 1810

This cannot be achieved conventionally. But 2 approaches are possible:

  • From each mapper you could create & write to a file on HDFS. Take care that since based on size of input data/ no. of files there may be many mappers running in parallel so the file name would have to be unique.
  • Better Approach : Emit a complex key from mapper for reducer. This complex key would have 2 parts : IdentifierOfKey:Key. IdentifierOfKey is nothing but a flag saying this record need to go to file E. Now in reducer you can use multiple outputs to get data into multiple files.

Upvotes: 1

Related Questions