Reputation: 279
I am copying some data from HDFS to S3 using below command :
$ hadoop distcp -m 1 /user/hive/data/test/test_folder=2015_09_19_03_30 s3a://data/Test/buc/2015_09_19_03_30
2015_09_19_03_30
bucket does not exists into S3. It successfully copies the data of /user/hive/data/test/test_folder=2015_09_19_03_30
directory into S3 2015_09_19_03_30
bucket, but when I execute the same command again it creates another bucket into S3.
I want that both the files should be in same bucket.
Upvotes: 0
Views: 915
Reputation: 1210
This is the case you were trying right, because it puts new files in same bucket
// first there is no data
$ hadoop fs -ls s3n://testing/
$
// then dist cp the data in dir input to testing bucket
$ hadoop distcp input/ s3n://testing/
$ hadoop fs -ls s3n://testing/
Found 1 items
drwxrwxrwx - 0 1970-01-01 00:00 s3n://testing/input
$ hadoop fs -ls s3n://testing/input/
Found 3 items
-rw-rw-rw- 1 1670 2016-09-23 13:23 s3n://testing/input/output
-rw-rw-rw- 1 541 2016-09-23 13:23 s3n://testing/input/some.txt
-rw-rw-rw- 1 1035 2016-09-23 13:23 s3n://testing/input/some2.txt
$
// added new file a.txt in input path
// and executed same command
$ hadoop distcp input/ s3n://testing/
$ hadoop fs -ls s3n://testing/input/
Found 4 items
-rw-rw-rw- 1 6 2016-09-23 13:26 s3n://testing/input/a.txt
-rw-rw-rw- 1 1670 2016-09-23 13:23 s3n://testing/input/output
-rw-rw-rw- 1 541 2016-09-23 13:23 s3n://testing/input/some.txt
-rw-rw-rw- 1 1035 2016-09-23 13:23 s3n://testing/input/some2.txt
$
Upvotes: 1