Reputation: 2158
This difference comes to me during weekend, when I was trying to transfer bulk data across different clusters (physically separated in rooms) through hftp
by doing
hadoop distcp hftp-path-src hftp-path-dst
hftp
url is something like hftp://node:50070/more/path
It failed mid-way, on some files. Logs said
Unhandled internal error. Vertex failed, vertexName=scope-152 ...
I checked those files manually and didn't find anything suspicious. I also tried the following foolish Pig
script to see if it could surprise me
data = LOAD '$src_hftp' USING PigStorage('\t', '-schema');
STORE data INTO '$dst_hftp' USING PigStorage('\t', '-schema');
, which failed miserably with the message
"...DAG did not succeed due to VERTEX_FAILURE"
Now how about
hadoop distcp hdfs-path-src hdfs-path-dst
with hdfs-path
being something like hdfs://namenode:8020/more/path
.
It worked fine. What? Why?
Many many thanks in advance.
===========================================
In response to @rahulbmv's answer, I did try
hadoop distcp hftp-path-src hdfs-path-dst
which failed in the mid-way also, and I could see some of the transferred file on the dst
HDFS, others missing. So I thought this's irrelevant. The reference I referred to was http://www.cloudera.com/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_7_2.html.
I also tried logging into dst
namenode server and doing
hadoop distcp hftp-path-src normal-path-without-hdfs-or-hftp
Same error happened.
But yes, writing-port should use hdfs
protocol. By using hdfs
protocol, error persisted. After @rahulbmv has pointed out, the only difference was really the protocol the reader used. I will go back and dig the error messages up later today.
Upvotes: 0
Views: 301