Loading the same file in memory for each mapper hadoop

I came across an algorithm where, the same file is loaded into the main memory for each mapper.

I assume that, we must use distributed cache to get the file, and read the file and load it into memory, for each mapper. When I implemented this, I found that the map was taking a long time to complete. I am assuming, it is because, the file is read every time from the local disc for each mapper value.

Am I correct in implementing it?

Is there any other suggestions?

Pls help! Thanks in advance!

Upvotes: 1

Views: 781

Answers (1)

Chris Gerken
Chris Gerken

Reputation: 16400

You want to read from local disk in the Mapper setup() method. Use an instance variable to hold on to the reference.

Upvotes: 1

Related Questions