user2528987
user2528987

Reputation: 31

Reading from multiple databases in a single job - hadoop

I am very new to Hadoop. I am trying to write a map-reduce job that reads data from two different databses (say MySQL and Postgres). I know that we can read from a single database, for example MySQL, using the DBInputFormat, and by specifying the JDBC driver as follows:

DBConfiguration.configureDB(conf, “com.mysql.jdbc.Driver”, “jdbc:mysql://localhost/mydatabase”); 

However, how can we do that if we want to read from multiple databases? in other words, how can we specify multiple JDBC drivers in the DBConfiguration?

Upvotes: 0

Views: 610

Answers (2)

Chris White
Chris White

Reputation: 30089

Another alternative to MultipleInputs would be to run 2 map only jobs, then a final job to use the output from those jobs as input (with an identity mapper) and perform any merge logic you require in the reducer.

Upvotes: 1

Tariq
Tariq

Reputation: 34184

AFAIK, there is no OOB support for this. As an alternative you could export data from your RDBMSs as raw text files and then use MultipleInputs to do whatever you want.

I would also suggest you to have a look at Apache sqoop, in case you haven't done yet.

Upvotes: 0

Related Questions