Nipun
Nipun

Reputation: 4319

Hadoop: Getting the input file name in the mapper only once

I am new in hadoop and currently working on hadoop. I have a small query.

I have around 10 files in input folder which I need to pass to my map reduce program. I want the file Name in my mapper as my fileName contains the time at which this file got created. I saw people using FileSplit to get the file Name in mapper. If let say my input files contains million of lines then every time mapper code will be called, it will get the file Name and then extract the time from the file, which is obvious a repeated time consuming thing for the same file. Once I get the time in the mapper I do not have to again and again assign the time from the file.

How can I achieve this?

Upvotes: 2

Views: 3205

Answers (1)

Ashrith
Ashrith

Reputation: 6855

You could use Mapper's setup method to get the filename, as setup method is gaurenteed to run only once before map() method gets initialized like this:

public class MapperRSJ extends Mapper<LongWritable, Text, CompositeKeyWritableRSJ, Text> {
  String filename;

  @Override
  protected void setup(Context context) throws IOException, InterruptedException {
    FileSplit fsFileSplit = (FileSplit) context.getInputSplit();
    filename = context.getConfiguration().get(fsFileSplit.getPath().getParent().getName()));
  }

  @Override
  public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    // process each key value pair
  }
}

Upvotes: 5

Related Questions