Extracting Data from Oracle to Hadoop. Is Sqoop a good idea

Question

I'm looking to extract some data from an Oracle database and transferring it to a remote HDFS file system. There appears to be a couple of possible ways of achieving this:

Use Sqoop. This tool will extract the data, copy it across the network and store it directly into HDFS
Use SQL to read the data and store in on the local file system. When this has been completed copy (ftp?) the data to the Hadoop system.

My question will the first method (which is less work for me) cause Oracle to lock tables for longer than required?

My worry is that that Sqoop might take out a lock on the database when it starts to query the data and this lock isn't going to be released until all of the data has been copied across to HDFS. Since I'll be extracting large amounts of data and copying it to a remote location (so there will be significant network latency) the lock will remain longer than would otherwise be required.

Extracting Data from Oracle to Hadoop. Is Sqoop a good idea

Answers (1)

Related Questions