Haifeng Zhang
Haifeng Zhang

Reputation: 31915

How to upload multiple files(20K+) to AWS S3

How to upload multiple files to AWS S3?

I tried two ways and both failed:

1)s3cmd shows following error even the file is only 270KB.

   $s3cmd put file_2012_07_05_aa.gz  s3://file.s3.oregon/
   file_2012_07_05_aa.gz -> s3://file.s3.oregon/file_2012_07_05_aa.gz  [1 of 1]
   45056 of 272006    16% in    1s    25.62 kB/s  failed
   WARNING: Upload failed: /file_2012_07_05_aa.gz ([Errno 32] Broken pipe)
   WARNING: Retrying on lower speed (throttle=0.00)
   WARNING: Waiting 3 sec...

2) use boto's S3 interface.

The boto library is working fine for me only when I create bucket using "US Standard", if I choose other regions like Oregon, it will fail and displays "Connection reset by peer"

def connect_to_s3(access_key, secret_key):
    conn = S3Connection(access_key, secret_key)
    return conn


def percent_cb(complete, total):
    sys.stdout.write('.')
    sys.stdout.flush()

def upload_to_s3(bucket, file_name):
    key = bucket.new_key(file_name)
    key.set_contents_from_filename(file_name,cb=percent_cb, num_cb=10)

Upvotes: 0

Views: 2340

Answers (3)

Alex Anderson
Alex Anderson

Reputation: 670

My personal favorite solution is using CyberDuck. You log in with S3 API Key credentials, and it works just like a file system explorer. If you drag your folder with your 20000 files, it will upload them just like that. Downloading is just as simple.

Upvotes: 0

koolhead17
koolhead17

Reputation: 1964

Alternatively you could use Minio Client aka mc

using mc mirror this can be achieved

 
$ mc mirror localdir S3alias/remotebucket

If due to network issue or throttling disconnection happens, Minio Client will start the upload from where it left last like.

mc:  Session safely terminated. To resume session ‘mc session resume yarbWRwf’

Hope it helps.

Dasclaimer: I work for Minio

Upvotes: 2

Matt Domsch
Matt Domsch

Reputation: 486

Broken pipe errors have historically happened when the socket_timeout value was too low. Please check your ~/.s3cfg file to ensure socket_timeout = 300 is set.

The default was changed from 10 seconds to 300 seconds in:

commit b503566c362c81dc8744a569820461c1c39bacae
Author: Michal Ludvig <[email protected]>
Date:   Mon Apr 11 02:01:08 2011 +0000
* S3/Config.py: Increase socket_timeout from 10 secs to 5 mins.

Upvotes: 1

Related Questions