Reputation: 926
I have a requirement to ingest the data from an Oracle database to Hadoop in real-time.
What's the best way to achieve this on Hadoop?
Upvotes: 5
Views: 4279
Reputation: 1379
Expanding a bit on what @Nickolay mentioned, there are a few options, but the best would be too opinion based to state.
Tungsten (open source)
Tungsten Replicator is an open source replication engine supporting a variety of different extractor and applier modules. Data can be extracted from MySQL, Oracle and Amazon RDS, and applied to transactional stores, including MySQL, Oracle, and Amazon RDS; NoSQL stores such as MongoDB, and datawarehouse stores such as Vertica, Hadoop, and Amazon rDS.
Oracle GoldenGate is a comprehensive software package for real-time data integration and replication in heterogeneous IT environments. The product set enables high availability solutions, real-time data integration, transactional change data capture, data replication, transformations, and verification between operational and analytical enterprise systems. It provides a handler for HDFS.
SharePlex™ Connector for Hadoop® loads and continuously replicates changes from an Oracle® database to a Hadoop® cluster. This gives you all the benefits of maintaining a real-time or near real-time copy of source tables
Upvotes: 3
Reputation: 32063
The important problem here is getting the data out of the Oracle DB in real time. This is usually called Change Data Capture, or CDC. The complete solution depends on how you do this part.
Other things that matter for this answer are:
Coming back to CDC, there are three different approaches to it:
Upvotes: 4
Reputation: 4372
Apache Sqoop is a data transfer tool to transfer bulk data from any RDBMS with JDBC connectivity(supports Oracle also) to hadoop HDFS.
Upvotes: 0