Reputation: 37
Can a mapreduce program be configured such that only reducer is executed not mapper[even though there is a map function defined in the program]. Can this be achieved just by changing job configuration.
I want to implement incremental computation in mapreduce(with input as append only files). For example
For wordcount,
If wordcount is already executed on a file, after which some more data is appended to the input file.
If again wordcount is executed on the updated input file, I want to execute wordcount only on the new data and combine the old results with this. For this combining of outputs I want to execute reducer alone separately.
Upvotes: 0
Views: 391
Reputation: 382
yes You can!, use this code as mapper(python_version)
import sys
for i in sys.stdin:
print i
this will do the trick, because mapper is must one. so just print the contents of the input with a dummy mapper
I guess that helped!
Upvotes: 0
Reputation: 20969
No, this is not possible. Hadoop requires you to do a map
, while the reduce
is optional.
If you want to do a group-by, you can try to use Apache Tez
and configure a DAG that will the same what you want to archieve (might be still hacky, because you will need to use the internal data format).
Upvotes: 1