Reputation: 129
I need to move a large amount of data from an Oracle Database to Hadoop without connecting the two systems. Is it possible to export data from Oracle via Sqoop directly to the local filesystem without importing to HDFS. I'd like to export to ORC and then just move the files via external disks to the Hadoop cluster.
Upvotes: 0
Views: 341
Reputation: 1496
You can not use SQOOP in your case. SQOOP ("SQL to Hadoop") runs in hadoop and uses, by default, JDBC to connect with the DB (as i explain in this answer, you can change it using --direct option). If hadoop nodes can not connect with the DB server, then you can not use it.
ORC is a very specific format used by Hive, you will require to find how use hive libraries to create the ORC files outside hadoop clustes, if it is possible.
By your constrains I will suggest to export DB using DB´s dump capabilities into a CSV file, compress the file and then copy it into HDFS.
If you are planning to use Hive, then you can LOAD the text file into a table configured to store the data using ORC.
Upvotes: 1