Reputation: 1604
I would like the output of my streaming reducer task to be different for partition number 0 than for the other partitions. How can I tell from within my script what reducer task it is running as?
Upvotes: 0
Views: 587
Reputation: 361
As Nonnib said, if you run your job on MR2/Yarn:
mapreduce_task_id
is not set. Use mapred_task_id
instead.
The only reference I have is a Vowpal Wabbit script (also, I use it in my Yarn jobs and it is works well with version up to Hadoop 2.0.0-cdh4.6.0)
Upvotes: 1
Reputation: 1604
I just figured out that there are environment variables mapreduce_task_id
and mapreduce_task_partition
that one can access from within the script.
These will have different values for different reduce tasks for example, task 0 has:
mapreduce_task_id=task_1410791469618_0007_r_000000
whereas, task 1 has:
mapreduce_task_id=task_1410791469618_0007_r_000001
Similarly, task 0 has:
mapreduce_task_partition=0
and
mapreduce_task_partition=1
.
In Python, these can be accessed as follows:
import os
my_task_id = os.environ.get('mapreduce_task_partition')
Upvotes: 1