Reputation: 6190
We have a use case where we want to use S3 to push event based + product metrics temporarily until they are loaded in a relational data warehouse (Oracle). These metrics would be sent by more than 200 application servers to S3 and persisted in different files per metric per server. The frequency of some of the metrics could be high for e.g. sending number of active http sessions on the app server every minute or the memory usage per minute. Once the metrics are persisted in S3, we would have something on the data warehouse that would read the csv file and load them in Oracle. We thought of using S3 over a queue (kafka/activemq/rabbit mq) due to various factors including cost, durability and replication. I have a few questions related to the write and read mechanisms with S3
Thanks
Upvotes: 0
Views: 324
Reputation: 9464
FYI, 200 servers sending one request per minute is not "high". You are likely over engineering this. SQS is simple, highly redundant/available, and would likely meet your needs far better than growing your own solution.
To answer your questions in detail:
1) No, you cannot "guarantee delivery", especially with asynchronous S3 operations. You could design recoverable operations, but not guaranteed delivery.
2) That isn't what S3 is for... It's whole object writing... You would want to create a system where you add lots of small files... You probably don't want to do this. Updating a file (especially from multiple threads) is dangerous, each update will replace the entire file...
3) If you must do this, use the object api, process each file one-at-a-time, and delete them when you are done... You are much better off building a queue-based system.
Upvotes: 2