Jon Cardoso-Silva
Jon Cardoso-Silva

Reputation: 990

Is there a way to use a JDBC as a input resource for Hadoop's MapReduce?

I have data in a PostgreSQL DB and I'd like to get it, treat it and save it to a HBase DB. Is it possible to distribute somehow the JDBC operation in a Map operation?

Upvotes: 2

Views: 1038

Answers (2)

twid
twid

Reputation: 6686

Yes you can do that by DBInputFormat:

DBInputFormat uses JDBC to connect to data sources. Because JDBC is widely implemented, DBInputFormat can work with MySQL, PostgreSQL, and several other database systems. Individual database vendors provide JDBC drivers to allow third-party applications (like Hadoop) to connect to their databases.

The DBInputFormat is an InputFormat class that allows you to read data from a database. An InputFormat is Hadoop’s formalization of a data source; it can mean files formatted in a particular way, data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database, as well as the means to read from arbitrary SQL queries performed against the database.

LINK

Upvotes: 3

Kukanani
Kukanani

Reputation: 758

I think you're looking for Sqoop, which is designed to import from SQL servers to HDFS stack technologies. It puts the data it gets from a JDBC connection into HDFS, thereby splitting it across your Hadoop NameNodes. I believe this is what you are looking for.

SQl to hadOOP = SQOOP, get it?

Sqoop can import into HBase. See this link.

Upvotes: 2

Related Questions