Reputation: 4319
I am new in hadoop and currently working on hadoop. I have a small query.
I have around 10 files in input folder which I need to pass to my map reduce program. I want the file Name in my mapper as my fileName contains the time at which this file got created. I saw people using FileSplit to get the file Name in mapper. If let say my input files contains million of lines then every time mapper code will be called, it will get the file Name and then extract the time from the file, which is obvious a repeated time consuming thing for the same file. Once I get the time in the mapper I do not have to again and again assign the time from the file.
How can I achieve this?
Upvotes: 2
Views: 3205
Reputation: 6855
You could use Mapper's setup
method to get the filename, as setup
method is gaurenteed to run only once before map()
method gets initialized like this:
public class MapperRSJ extends Mapper<LongWritable, Text, CompositeKeyWritableRSJ, Text> {
String filename;
@Override
protected void setup(Context context) throws IOException, InterruptedException {
FileSplit fsFileSplit = (FileSplit) context.getInputSplit();
filename = context.getConfiguration().get(fsFileSplit.getPath().getParent().getName()));
}
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// process each key value pair
}
}
Upvotes: 5