G Krishna Sampath
G Krishna Sampath

Reputation: 97

Reading from a specific file from a directory containing many files in hadoop

I want to read a specific file from a list of files that are present in hadoop based on the name of the file. If the filename matches my givenname I want to process that file data. Here is the below way I have tried in the map method

public void map(LongWritable key,Text value,Context con) throws IOException, InterruptedException
        {
            FileSplit fs =(FileSplit) con.getInputSplit(); 
            String filename= fs.getPath().getName();
            filename=filename.split("-")[0];
            if(filename.equals("aak"))
            {
                    String[] tokens = value.toString().split("\t");
                    String name=tokens[0];
                    con.write(new Text("mrs"), new Text("filename"));
            }

        }

Upvotes: 1

Views: 123

Answers (2)

vefthym
vefthym

Reputation: 7462

Either use a PathFilter, as Arani suggests (+1 for this), or,
if your criterion for selecting your input file is simply that it starts with the string "aak-", then I think, you can easily do what you wish, by changing your input path in your main method (Driver class), like that:

replace:

String inputPath = "/your/input/path"; //containing the file /your/input/path/aak-00000   
FileInputFormat.setInputPaths(conf, new Path(inputPath));

with:

String inputPath = "/your/input/path"; //containing the file /your/input/path/aak-00000
FileInputFormat.setInputPaths(conf, new Path(inputPath+"/aak-*"))

Upvotes: 1

Arani
Arani

Reputation: 182

You need to write a custom PathFilter implementation and then use setInputPathFilter on FileInputFormat in your driver code. Please take a look at the below link:

https://hadoopi.wordpress.com/2013/07/29/hadoop-filter-input-files-used-for-mapreduce/

Upvotes: 1

Related Questions