Reputation: 96836
What is the best practice for reporting exceptions in Hadoop streaming with Python scripts?
I mean: let's say I have a mapper script that can't understand its input, how do I signal Hadoop to terminate the job & report an error message?
Do I use logging
and finish off with sys.exit
?
Upvotes: 3
Views: 1400
Reputation: 30089
If you want to signal error, return a non-zero code from your python script. You can write any logging to stderr and hadoop will capture that in the task logs. You can also send status to the reporter and counters by prefixing the stderr lines with reporter:status:<msg>
or reporter:counter:<group>,<name>,<increment>
Upvotes: 4