Reputation: 385
Our goal is to send multiple streams from our application into Redshift for analysis. One of us had the idea of sending all the streams into the same bucket with different prefixes. The intention is to simplify our IAM roles and S3 bucket usage. This way we could have one bucket per environment (dev, staging, and prod) that all the streams would run through. I am somewhat new to this technology but this seems like a non-standard approach and I am concerned it might introduce unexpected bottlenecks down the road. Has anyone tried this? How did it work out?
Upvotes: 0
Views: 916
Reputation: 1968
If the prefixes are different it should scale well. the only bottleneck I have seen for S3 is when we used versioning and did not delete the "delete markers". Otherwise I have seen a bucket with lower petabytes of data in it and it worked fine.
For the data volume you expect - few million messages a month, 5k or less - I have seen an S3 bucket which would probably have an equivalent of years of your data, all small objects, the structure was according to AWS suggestion - flat structure, long random hashes just in the root. And the application worked.
Lower tens of millions objects can have a regular "folder" structure without any issues. I have seen over 50 mil of smaller objects in just a few root "folders".
Upvotes: 1