KumarfromIndia
KumarfromIndia

Reputation: 9

Advice on copying data from one s3 bucket to another storage

As the title says ,I am new to AWS and went through this post to find the right approach. Can you guys please advise on what is the right approach with the following considerations?

we expect the client to upload a bunch of files to a source_s3 bucket 1st of every month in a particular cadence (12 times a year). We would then copy it to the target_s3 in our vpc that we use as part of the web app development

file size assumption: 300 mb to 1gb each

file count each month: -7-10

file format: csv

Also, the files in target_s3 will be used as part of the Lamda calculation when a user triggers it in the ui. so does it make sense to store the files as parquet in the target_s3?

Upvotes: 0

Views: 41

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 270059

Copying objects to another bucket

There are two ways you can copy objects to another bucket when they arrive:

  • Use S3 Replication that will automatically copy the objects. It requires Versioning to be active on both buckets, or
  • Write an AWS Lambda function that is triggered by an S3 Event. You would code the Lambda function to copy the object to the other bucket. This method is good if you wish to be more selective rather than copying every object that arrives. (Example 1 | Example 2)

Preferred file format

The choice of file format is totally up to you. It would depend on what your source system can generate and what format the code in your Lambda function can process.

Pandas can read both formats. CSV files are annoying because there is no 'single standard', whereas Parquet includes the schema and data types.

Upvotes: 0

Related Questions