Brian
Brian

Reputation: 7326

Sqoop-2 fails on large import to single node with custom query using sqoop shell

I am prototyping migration of a large record set generated by a computationally expensive custom query. This query takes approximately 1-2 hours to return a result set in SQL Developer

I am attempting to pass this query to a simple Sqoop job with links JDBC to HDFS

I have encountered the following errors in my logs:

2016-02-12 10:15:50,690 ERROR mr.SqoopOutputFormatLoadExecutor [org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:257)] Error while loading data out of MR job. org.apache.sqoop.common.SqoopException: GENERIC_HDFS_CONNECTOR_0005:Error occurs during loader run at org.apache.sqoop.connector.hdfs.HdfsLoader.load(HdfsLoader.java:110) at org.apache.sqoop.connector.hdfs.HdfsLoader.load(HdfsLoader.java:41) at org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:250) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/username/schema/recordset/.72dee005-3f75-4a95-bbc0-30c6b565d193/f5aeeecc-097e-49ab-99cc-b5032ae18a84.txt (inode 16415): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-1820866823_31, pendingcreates: 1]

When I try to check the resulting .txt files in my hdfs, they are empty.

Has anyone encountered and solved this? Also, I am noticing additional bugginess with the Sqoop shell. For example, I am unable to check the job status as it always returns UNKNOWN.

I am using sqoop-1.99.6-bin-hadoop200 with Hadoop 2.7.2 (Homebrew install). I am querying a remote Oracle 11 database with the Generic JDBC Connector.

I have already conducted a smaller import job using the schema/table parameters in create job

I am tempted to migrate the entire schema table by table, then just use Hive to generate and store the record set I want. Would this be a better/easier solution?

Upvotes: 0

Views: 687

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException

This query takes approximately 1-2 hours to return a result set in SQL Developer

I would bet that Sqoop 1.99 creates an empty HDFS file (i.e. the NameNode gets the request, creates the file but does not materialize it for other clients yet, grants an exclusive write lease for Sqoop, and assigns responsibility for writing block#1 to a random DataNode) then waits for the JDBC ResultSet to produce some data... without doing any keep-alive in the meantime.

But alas, after 60 minutes, the NameNode just sees that the lease has expired without any sign of the Sqoop client being alive, so it closes the file -- or rather, makes as if it was never created (no flush has ever occured).

Any chance you can reduce the time lapse with a /*+ FIRST_ROWS */ hint on Oracle side?

Upvotes: 1

Related Questions