Reputation: 6348
I'm working on a project with map reduce.
My understanding of Hadoop is that it will seperate my data into blocks which will then be turned into splits where a split corresponds to a single map task.
It would be my assumption that each split would have an ID or number associated with it.
I'm wondering if there is any way to get this split Id/number or even the block Id/number as the key to the map function?
ie:
map(split_id, data)
Upvotes: 0
Views: 388
Reputation: 1170
The Inputsplit toString()
method will return a pattern. If hash this pattern using MD5 Hash we can get an Unique Id identifying each of the input splits.
InputSplit is = context.getInputSplit();
splitId = MD5Hash.digest(is.toString()).toString();
Then we can use the splitId as the key to the mapper function.
Upvotes: 0