Reputation: 37

Configure MapReduce program to run only reducer in existing program

Can a mapreduce program be configured such that only reducer is executed not mapper[even though there is a map function defined in the program]. Can this be achieved just by changing job configuration.

I want to implement incremental computation in mapreduce(with input as append only files). For example

For wordcount,

If wordcount is already executed on a file, after which some more data is appended to the input file.

If again wordcount is executed on the updated input file, I want to execute wordcount only on the new data and combine the old results with this. For this combining of outputs I want to execute reducer alone separately.

Upvotes: 0

Answers (2)

Abu Tahir

Reputation: 382

yes You can!, use this code as mapper(python_version)

import sys for i in sys.stdin: print i

this will do the trick, because mapper is must one. so just print the contents of the input with a dummy mapper

I guess that helped!

Upvotes: 0

Thomas Jungblut

Reputation: 20969

No, this is not possible. Hadoop requires you to do a map, while the reduce is optional.

If you want to do a group-by, you can try to use Apache Tez and configure a DAG that will the same what you want to archieve (might be still hacky, because you will need to use the internal data format).

Upvotes: 1

Configure MapReduce program to run only reducer in existing program

Answers (2)

Related Questions