Reputation: 554
My map function has to read a file for every input. That file doesn't change at all, it is only for reading. Distributed cache might help me a lot i think, but i cant find a way to use it. The public void configure(JobConf conf) function that i need to override, i think is deprecated. Well JobConf is deprecated for sure. All the DistributedCache tutorials use the deprecated way to. What can i do? Is there another configure function that i can override??
These are the very first lines of my map function:
Configuration conf = new Configuration(); //load the MFile
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path("planet/MFile");
FSDataInputStream in = fs.open(inFile);
DecisionTree dtree=new DecisionTree().loadTree(in);
I want to cache that MFile so that my map function doesn't need to look it over and over again
Upvotes: 4
Views: 4221
Reputation: 554
Well i did it, i think. I followed Ravi Bhatt tips and i wrote this :
@Override
protected void setup(Context context) throws IOException, InterruptedException
{
FileSystem fs = FileSystem.get(context.getConfiguration());
URI files[]=DistributedCache.getCacheFiles(context.getConfiguration());
Path path = new Path(files[0].toString());
in = fs.open(path);
dtree=new DecisionTree().loadTree(in);
}
Inside my main method i do this, to add it in the cache:
DistributedCache.addCacheFile(new URI(args[0]+"/"+"MFile"), conf);
Job job = new Job(conf, "MR phase one");
I am able to retrieve the file i need with this way, but cant tell yet if it works 100%. Is there any way to test it? Thanks.
Upvotes: 5
Reputation: 3163
Jobconf
was deprecated in 0.20.
x but in 1.0.0
it is not! :-) (as of writing this)
To your question, there are two ways to run map reduce jobs in java, one is by using (extending
) classes in org.apache.hadoop.mapreduce
package and other is by implementing
classes in org.apache.hadoop.mapred
package (or the other way round ).
Not sure which one you are using, if you don't have a configure
method to override, you will get a setup
method to override.
@Override
protected void setup(Context context) throws IOException, InterruptedException
This is similar to configure and should help you.
You get a setup
method to override
when you extend Mapper class
in org.apache.hadoop.mapreduce
package.
Upvotes: 1