Reputation: 2204
I am trying to sequence file from distributed cache in EMR but its unable to read the file from distributed cache in EMR. My code works fine in local but its giving me issue on emr. Here is my code snippet-
Putting sequence file to distributed cache-
job.addCacheFile(new URI(status.getPath().toString()));
Reading the path-
for (Path eachPath : cacheFilesLocal) {
loadMap(eachPath.getName(),context.getConfiguration());
}
Reading the file from path-
private void loadMap(String filePath,Configuration conf) throws IOException
{
try {
Path somePath=new Path(filePath);
reader=new Reader(somePath.getFileSystem(conf),somePath,conf);
// brReader = new BufferedReader(new FileReader(filePath));
Writable key= new Text();
Writable value=new Text();
// Read each line, split and load to HashMap
while (reader.next(key,value)) {
// String index[]=strLineRead.toString().split(Pattern.quote(" - "));
rMap.put(key.toString(),value.toString());
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
if (reader != null) {
reader.close();
}
}
}
Any help will be appreciated.
Upvotes: 2
Views: 249
Reputation: 411
In the arguments provide the S3 path as per the documentation enter link description here
Now in the Driver class use the arguments like:
job.addCacheFile(new URI(args[3]));
job.addCacheFile(new URI(args[4]));
job.addCacheFile(new URI(args[5]));
job.addCacheFile(new URI(args[5]));
And in Mapper use the Cache files as usual.
cacheFiles = context.getCacheFiles();
if (cacheFiles != null) {
File cityCacheFile = new File("AreaCityCountryCache");
worked for me...
Upvotes: 0