Reputation: 7326
I am prototyping migration of a large record set generated by a computationally expensive custom query. This query takes approximately 1-2 hours to return a result set in SQL Developer
I am attempting to pass this query to a simple Sqoop
job with links JDBC
to HDFS
I have encountered the following errors in my logs:
2016-02-12 10:15:50,690 ERROR mr.SqoopOutputFormatLoadExecutor [org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:257)] Error while loading data out of MR job. org.apache.sqoop.common.SqoopException: GENERIC_HDFS_CONNECTOR_0005:Error occurs during loader run at org.apache.sqoop.connector.hdfs.HdfsLoader.load(HdfsLoader.java:110) at org.apache.sqoop.connector.hdfs.HdfsLoader.load(HdfsLoader.java:41) at org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:250) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/username/schema/recordset/.72dee005-3f75-4a95-bbc0-30c6b565d193/f5aeeecc-097e-49ab-99cc-b5032ae18a84.txt (inode 16415): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-1820866823_31, pendingcreates: 1]
When I try to check the resulting .txt
files in my hdfs
, they are empty.
Has anyone encountered and solved this? Also, I am noticing additional bugginess with the Sqoop shell
. For example, I am unable to check the job status as it always returns UNKNOWN.
I am using sqoop-1.99.6-bin-hadoop200
with Hadoop 2.7.2 (Homebrew install)
. I am querying a remote Oracle 11
database with the Generic JDBC Connector
.
I have already conducted a smaller import job using the schema/table
parameters in create job
I am tempted to migrate the entire schema table by table, then just use Hive to generate and store the record set I want. Would this be a better/easier solution?
Upvotes: 0
Views: 687
Reputation: 9067
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException
This query takes approximately 1-2 hours to return a result set in SQL Developer
I would bet that Sqoop 1.99 creates an empty HDFS file (i.e. the NameNode gets the request, creates the file but does not materialize it for other clients yet, grants an exclusive write lease for Sqoop, and assigns responsibility for writing block#1 to a random DataNode) then waits for the JDBC ResultSet to produce some data... without doing any keep-alive in the meantime.
But alas, after 60 minutes, the NameNode just sees that the lease has expired without any sign of the Sqoop client being alive, so it closes the file -- or rather, makes as if it was never created (no flush has ever occured).
Any chance you can reduce the time lapse with a /*+ FIRST_ROWS */
hint on Oracle side?
Upvotes: 1