Reputation: 9
As the title says ,I am new to AWS and went through this post to find the right approach. Can you guys please advise on what is the right approach with the following considerations?
we expect the client to upload a bunch of files to a source_s3 bucket 1st of every month in a particular cadence (12 times a year). We would then copy it to the target_s3 in our vpc that we use as part of the web app development
file size assumption: 300 mb to 1gb each
file count each month: -7-10
file format: csv
Also, the files in target_s3 will be used as part of the Lamda calculation when a user triggers it in the ui. so does it make sense to store the files as parquet in the target_s3?
Upvotes: 0
Views: 41
Reputation: 270059
There are two ways you can copy objects to another bucket when they arrive:
The choice of file format is totally up to you. It would depend on what your source system can generate and what format the code in your Lambda function can process.
Pandas can read both formats. CSV files are annoying because there is no 'single standard', whereas Parquet includes the schema and data types.
Upvotes: 0