Reputation: 31
I am very new to Hadoop. I am trying to write a map-reduce job that reads data from two different databses (say MySQL and Postgres). I know that we can read from a single database, for example MySQL, using the DBInputFormat, and by specifying the JDBC driver as follows:
DBConfiguration.configureDB(conf, “com.mysql.jdbc.Driver”, “jdbc:mysql://localhost/mydatabase”);
However, how can we do that if we want to read from multiple databases? in other words, how can we specify multiple JDBC drivers in the DBConfiguration?
Upvotes: 0
Views: 610
Reputation: 30089
Another alternative to MultipleInputs
would be to run 2 map only jobs, then a final job to use the output from those jobs as input (with an identity mapper) and perform any merge logic you require in the reducer.
Upvotes: 1
Reputation: 34184
AFAIK, there is no OOB support for this. As an alternative you could export data from your RDBMSs as raw text files and then use MultipleInputs to do whatever you want.
I would also suggest you to have a look at Apache sqoop, in case you haven't done yet.
Upvotes: 0