Reputation: 287
How can I get the input file name which is being executed in the hadoop mapper
in Hadoop Pipes?
I can easily get file name in java based map reducer like
FileSplit fileSplit = (FileSplit)context.getInputSplit(); String filename = fileSplit.getPath().getName(); System.out.println("File name "+filename); System.out.println("Directory and Filename"+fileSplit.getPath().toString());
but how can I get in C++;
Plz help me
Thanks
Upvotes: 2
Views: 2604
Reputation: 66
Below code will be able to print the filename
filepath = os.environ['mapreduce_map_input_file']
filename = os.path.split(filepath)[-1]
print filename
Upvotes: 0
Reputation: 3403
By parsing the mapreduce_map_input_file
(new) or (deprecated) environment variable, you can get the map input file name. map_input_file
Notice:
The two environment variables are case-sensitive, all letters should be lower-case.
Upvotes: 1
Reputation: 1148
If you are using HADOOP 2.x with Python:
file_name = os.environ['mapreduce_map_input_file']
Upvotes: 1
Reputation: 301
I have been struggled with the same problem. And I found the solution.
void map(HadoopPipes::MapContext& context) {
string path;
path = context.getInputSplit();
path.erase(path.end()-1);
}
I posted only reading filename part. getInputSplit() method returns the whole path of the file + some unknown character at the end. I want pure path of the file so remove the end character of the string. I have no idea why the weired character is added end of the string but let's use it just by removing the end character~!
Upvotes: 0
Reputation: 9571
Figured out how to do this in Python:
import os
filename = os.environ['map_input_file']
filename is the variable that you want - this will give you the filename that the mapper is working on.
Some other useful environment variables are:
Upvotes: 0
Reputation: 30089
For streaming / pipes jobs, the job configuration is serialized to process environment variables.
The job configuration property that defines the input file is named map.input.file
. The PipeMapRed class which launches the C++ program is responsible for this serialization (configure
method, line 151), and ensures that the job conf property names are escaped (addJobConfToEnvironment
method line 206/266) - meaning that all non a-Za-z0-9
characters are replaced with underscores (safeEnvVarName
method, lines 276/284) - so the actual environment variable you're looking for in your c++ program will be named map_input_file
.
I'm, not a c++ programmer, so i can't tell you how to obtain environment variables, but i'm sure it's simple enough.
Upvotes: 3