Reputation: 1327
I am trying to copy files from S3 to HDFS using the following command:
hadoop distcp s3n://bucketname/filename hdfs://namenodeip/directory
However this is not working, getting an error as following:
ERROR tools.DistCp: Exception encountered
java.lang.IllegalArgumentException: Invalid hostname in URI
I have tried to add the S3 keys in hadoop conf.xml, and it is also not working. Please help me the appropriate step by step procedure to achieve the file copy from S3 to HDFS.
Thanks in advance.
Upvotes: 2
Views: 27527
Reputation: 1327
The command should be like this :
Hadoop distcp s3n://bucketname/directoryname/test.csv /user/myuser/mydirectory/
This will copy test.csv file from S3 to a HDFS directory called /mydirectory in the specified HDFS path. In this S3 file system is being used in a native mode. More details can be found on http://wiki.apache.org/hadoop/AmazonS3
Upvotes: 7
Reputation: 3557
Copy log files stored in an Amazon S3 bucket into HDFS. Here --srcPattern option is used to limit the data copied to the daemon logs.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --jobflow j-3GY8JC4179IOJ --jar \
/home/hadoop/lib/emr-s3distcp-1.0.jar \
--args '--src,s3://myawsbucket/logs/j-3GY8JC4179IOJ/node/,\
--dest,hdfs:///output,\
--srcPattern,.*daemons.*-hadoop-.*'
Windows users:
ruby elastic-mapreduce --jobflow j-3GY8JC4179IOJ --jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,s3://myawsbucket/logs/j-3GY8JC4179IOJ/node/,--dest,hdfs:///output,--srcPattern,.*daemons.*-hadoop-.*'
Please check this link for more :
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
Hope this helps!
Upvotes: 1