mickzer
mickzer

Reputation: 6348

Hadoop - Get Split Ids in map function

I'm working on a project with map reduce.

My understanding of Hadoop is that it will seperate my data into blocks which will then be turned into splits where a split corresponds to a single map task.

It would be my assumption that each split would have an ID or number associated with it.

I'm wondering if there is any way to get this split Id/number or even the block Id/number as the key to the map function?

ie:

map(split_id, data)

Upvotes: 0

Views: 388

Answers (1)

madhu
madhu

Reputation: 1170

The Inputsplit toString() method will return a pattern. If hash this pattern using MD5 Hash we can get an Unique Id identifying each of the input splits.

    InputSplit is = context.getInputSplit();
    splitId = MD5Hash.digest(is.toString()).toString();

Then we can use the splitId as the key to the mapper function.

Upvotes: 0

Related Questions