Reputation: 990
I have data in a PostgreSQL DB and I'd like to get it, treat it and save it to a HBase DB. Is it possible to distribute somehow the JDBC operation in a Map operation?
Upvotes: 2
Views: 1038
Reputation: 6686
Yes you can do that by DBInputFormat
:
DBInputFormat
uses JDBC
to connect to data sources. Because JDBC
is widely implemented, DBInputFormat
can work with MySQL
, PostgreSQL
, and several other database systems. Individual database vendors provide JDBC
drivers to allow third-party applications (like Hadoop
) to connect to their databases.
The DBInputFormat
is an InputFormat
class that allows you to read data from a database. An InputFormat
is Hadoop’s formalization of a data source; it can mean files formatted in a particular way, data read from a database, etc. DBInputFormat
provides a simple method of scanning entire tables from a database, as well as the means to read from arbitrary SQL
queries performed against the database.
Upvotes: 3
Reputation: 758
I think you're looking for Sqoop, which is designed to import from SQL servers to HDFS stack technologies. It puts the data it gets from a JDBC connection into HDFS, thereby splitting it across your Hadoop NameNodes. I believe this is what you are looking for.
SQl to hadOOP = SQOOP, get it?
Sqoop can import into HBase. See this link.
Upvotes: 2