Reputation: 543
I am implementing some hadoop application. I am almost complete with my coding part. But want to improve the coder after reading in mapper design pattern book from "Lin & Chris Dryer". As for efficient implementation of this approach require to preserve state in map function for a definite period of time an then emitting the result. As this can be easily implemented by taking some data structure as a member variable in the mapper class and in then emitting in the cleanup method. The above implementation is feasible with the "org.apache.hadoop.mapreduce.mapper"interface.
But actually I was not able to setup new hadoop api in my system so working with the hadoop0.18, which doesn't have "mapdreduce"package,instead it uses "mapred" mapper interface for implementing the map function which doesn't have any cleanup method like in "mapredcue.mapper". Can still this in mapper design pattern be implemented in the old interface? Though it has a "close"method but this method does not provide any argument or facilities so that you can emit your key value pairs.
Upvotes: 0
Views: 442
Reputation: 30089
You can still do setup and cleanup with the old API.
Your mapper needs to implement the Configurable interface (or extend Configured). In this case, when the mapper class is created in MapRunner (via the ReflectionUtils.newInstance method), the setConf(Configuration) method will be called passing the Job configuration. The difference here between the new and old API - you don't have access to the OutputCollector in the old API (where as the new API you are passed the Context)
Finally, MapRunner will call the close method of your mapper when all records have been passed to the map method (similar to the cleanup method of the new API). Again, you don't have access to the OutputCollector, so if you want it, you'll need to create a reference to it in your map method.
Upvotes: 1