behold
behold

Reputation: 556

How can I get the file path for data shard in the Mapper of a Mapreduce job?

I have a mapreduce job, where the file input path is: /basedirectory/*/*.txt

Inside the basedirectory, I have different subfolders (CaseA, CaseB etc), each of which contain hdfs text files.

In the map phase of the job, I want to find out where exactly the data shard came from (e.g. CaseA). How can I achieve that?

I've done something similar for mapreduce jobs with more than 1 input hbase tables where I use context.getInputSplit().getTableName() to find the actual table name but not sure what to do for HDFS input files.

Upvotes: 1

Views: 355

Answers (1)

ki2
ki2

Reputation: 26

You can get input split using context.getInputSplit() (where context is mapper.context) and then use .getPath() method on the inputSplit to return the file path.

Upvotes: 1

Related Questions