Reputation: 714
I came across an algorithm where, the same file is loaded into the main memory for each mapper.
I assume that, we must use distributed cache to get the file, and read the file and load it into memory, for each mapper. When I implemented this, I found that the map was taking a long time to complete. I am assuming, it is because, the file is read every time from the local disc for each mapper value.
Am I correct in implementing it?
Is there any other suggestions?
Pls help! Thanks in advance!
Upvotes: 1
Views: 781
Reputation: 16400
You want to read from local disk in the Mapper setup() method. Use an instance variable to hold on to the reference.
Upvotes: 1