Reputation: 111
Here's the situation: We're using a data loading service to ingest about 1TB of json files from a directory in s3. We want to load those files into the ingestion directory in batches so that our loading service isn't overwhelmed. We're doing that batching with a local python script that uses the boto3 client copy method. Here's a sample from boto3's docs:
import boto3
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')
When using boto3's copy method, is there any reason to think that the client is downloading the file associated with the key and then doing a PUT into the new otherbucket/otherkey
location?
I know that there's a charge for any operations in s3, so I'm basically trying to ensure that we're not going to get charged for those as well as that we don't waste our own bandwidth like that.
Upvotes: 1
Views: 564
Reputation: 179384
This is a PUT+Copy
. It's a single request sent to the target bucket, specifying the source bucket and object.
It is not a download/upload, but you are still charged for the PUT
request against the target bucket, and a GET
request that the target bucket sends to the source bucket to fetch the content.
The data is transferred internally within S3, so it's not using your Internet bandwidth, but the source bucket is billed for cross-region bandwidth if the source and target buckets are in different regions. This is billed at a lower rate than the "out to the Internet" bandwidth charge.
See: Copying Objects in a Single Operation
Upvotes: 5