How to process lines in a file in specific hadoop slave?

Question

We have a custom input format extending the FileInputFormat, which generates a separate split for each line in the input file. This file provides a host name in which the mapper handling this line should run.

How do I achieve this?

This is needed as the mapper reads data from DB and I want to run the mapper in the same machine as the DB server.

Joe Stein · Accepted Answer

Not possible without writing your own implementation within the Hadoop code base.

If you are trying to add more data to the map input then pass it in as an argument to the job and you can then have it in your map() and concatenate it with the input.

How to process lines in a file in specific hadoop slave?

Answers (1)

Related Questions