Reputation: 15975
I am using Hadoop, and I want to use static variables to reduce the number of method calls I have to make. Here is how I'm using statics:
public class Mapper<Types...> extends Mapper <Types...> {
protected static volatile String myVar;
@Override
public final void setup(Context context) {
if (myVar == null)
myVar = context.getConfiguration().get("myOpt");
}
}
I know that a Mapper is initialized for each map task. My worry is that the Mapper class itself is initialized once and then stays initialized between jobs. So if I run job1, myVar will be set to "myOpt1", and then I run job2, myVar will stay "myOpt1" even though I passed in "myOpt2". Is this fear unfounded? Thanks.
Upvotes: 1
Views: 271
Reputation: 30089
If you configure JVM reuse to be a value greater than 1 then it's plausible that a TaskTracker will re-use the JVM for subsequent job tasks scheduled to run on that task tracker (so a jvm reuse value of 5, with 10 tasks ultimately scheduled to run on the task tracker means that a JVM will be spawned which will run the first 5 tasks in sequence, the JVM will be stopped, then a second JVM will be spawned to run the final 5 tasks in sequence). In this scenario, static variables will retain values while the JVM stays alive for each subsequent map task.
This property is mapred.job.reuse.jvm.num.tasks
for pre v2 hadoop and mapreduce.job.jvm.numtasks
for v2 onwards
Upvotes: 2
Reputation: 32969
It is unfounded. For each job the TaskTracker
will start up a new JVM instance to run the job.
Upvotes: 1