Reputation: 97
I want to read a specific file from a list of files that are present in hadoop based on the name of the file. If the filename matches my givenname I want to process that file data. Here is the below way I have tried in the map method
public void map(LongWritable key,Text value,Context con) throws IOException, InterruptedException
{
FileSplit fs =(FileSplit) con.getInputSplit();
String filename= fs.getPath().getName();
filename=filename.split("-")[0];
if(filename.equals("aak"))
{
String[] tokens = value.toString().split("\t");
String name=tokens[0];
con.write(new Text("mrs"), new Text("filename"));
}
}
Upvotes: 1
Views: 123
Reputation: 7462
Either use a PathFilter, as Arani suggests (+1 for this), or,
if your criterion for selecting your input file is simply that it starts with the string "aak-", then I think, you can easily do what you wish, by changing your input path in your main method (Driver class), like that:
replace:
String inputPath = "/your/input/path"; //containing the file /your/input/path/aak-00000
FileInputFormat.setInputPaths(conf, new Path(inputPath));
with:
String inputPath = "/your/input/path"; //containing the file /your/input/path/aak-00000
FileInputFormat.setInputPaths(conf, new Path(inputPath+"/aak-*"))
Upvotes: 1
Reputation: 182
You need to write a custom PathFilter implementation and then use setInputPathFilter on FileInputFormat in your driver code. Please take a look at the below link:
https://hadoopi.wordpress.com/2013/07/29/hadoop-filter-input-files-used-for-mapreduce/
Upvotes: 1