Reputation: 31

Reading from multiple databases in a single job - hadoop

I am very new to Hadoop. I am trying to write a map-reduce job that reads data from two different databses (say MySQL and Postgres). I know that we can read from a single database, for example MySQL, using the DBInputFormat, and by specifying the JDBC driver as follows:

DBConfiguration.configureDB(conf, “com.mysql.jdbc.Driver”, “jdbc:mysql://localhost/mydatabase”);

However, how can we do that if we want to read from multiple databases? in other words, how can we specify multiple JDBC drivers in the DBConfiguration?

Upvotes: 0

Answers (2)

Chris White

Reputation: 30089

Another alternative to MultipleInputs would be to run 2 map only jobs, then a final job to use the output from those jobs as input (with an identity mapper) and perform any merge logic you require in the reducer.

Upvotes: 1

Tariq

Reputation: 34184

AFAIK, there is no OOB support for this. As an alternative you could export data from your RDBMSs as raw text files and then use MultipleInputs to do whatever you want.

I would also suggest you to have a look at Apache sqoop, in case you haven't done yet.

Upvotes: 0

Reading from multiple databases in a single job - hadoop

Answers (2)

Related Questions