JRaSH
JRaSH

Reputation: 53

Is Mapper Object of Hadoop Shared across Multiple Threads?

I'm wondering if it is possible to add a member object that can be used across multiple map() call. For example, a StringBuilder:

private StringBuilder builder;

public void map(...){
    ...

    builder.setLength(0);
    builder.append(a);
    builder.append(b);
    builder.append(c);
    d = builder.toString();

    ...
}

Obviously, if the mapper object is shared across multiple threads, the builder object above will not behave as expected due to concurrent access from more than one threads.

So my question is: Is it assured that each thread in hadoop will use one dedicated mapper object for itself? Or it is a configurable behavior?

Thanks

Upvotes: 5

Views: 901

Answers (2)

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

As long as you are not using the MultithreadedMapper class, but your own, there will be no problem. map() is called sequential and not in parallel.

It is common to use a StringBuilder or other data structures to buffer a few objects between the calls. But make sure you clone the objects from your input objects, there is only one object and it will be filled over and over again to prevent lots of GC.

So there is no need to synchronize or take care of race conditions.

Upvotes: 2

Charles Menguy
Charles Menguy

Reputation: 41448

I don't think that's possible. The reason for that is that each mapper runs in its own JVM (they will be distributed on different machines), so there's no way you can share a variable or object across multiple mappers or reducers easily.

Now if all your mappers run on the same node, I believe there is a configuration for JVM reuse somewhere, but honestly I wouldn't bother with that, especially if all you need is a StringBuilder :)

I've seen this question once before, and it could be solved very easily by changing the design of the application. Maybe you can tell more about what you're trying to accomplish with this to see if this is really needed. If you really need it, you can still serialize your object, put it in HDFS, then read it with each mapper, deserialize it, but that seems backwards.

Upvotes: 0

Related Questions