Andy Dufresne
Andy Dufresne

Reputation: 6190

Using AWS S3 as an intermediate storage layer for monitoring platform

We have a use case where we want to use S3 to push event based + product metrics temporarily until they are loaded in a relational data warehouse (Oracle). These metrics would be sent by more than 200 application servers to S3 and persisted in different files per metric per server. The frequency of some of the metrics could be high for e.g. sending number of active http sessions on the app server every minute or the memory usage per minute. Once the metrics are persisted in S3, we would have something on the data warehouse that would read the csv file and load them in Oracle. We thought of using S3 over a queue (kafka/activemq/rabbit mq) due to various factors including cost, durability and replication. I have a few questions related to the write and read mechanisms with S3

  1. For event based metrics, how can we write to S3 such that the app server is not blocked? I see that the java sdk does support asynchronous writes. Would that guarantee deliveries?
  2. How can we update a csv file created on S3 by appending a record? From what I have read we cannot update an S3 object. What would be an efficient way for pushing monitoring metrics to S3 at periodic intervals?
  3. When reading from S3, performance isn't a critical requirement. What would be an optimized way of loading the csv files into Oracle? A couple of ways included using the get object api from java sdk or mount S3 folders as NFS shares and creating external tables. Are there any other efficient ways of reading?

Thanks

Upvotes: 0

Views: 324

Answers (1)

Rob Conklin
Rob Conklin

Reputation: 9464

FYI, 200 servers sending one request per minute is not "high". You are likely over engineering this. SQS is simple, highly redundant/available, and would likely meet your needs far better than growing your own solution.

To answer your questions in detail:

1) No, you cannot "guarantee delivery", especially with asynchronous S3 operations. You could design recoverable operations, but not guaranteed delivery.

2) That isn't what S3 is for... It's whole object writing... You would want to create a system where you add lots of small files... You probably don't want to do this. Updating a file (especially from multiple threads) is dangerous, each update will replace the entire file...

3) If you must do this, use the object api, process each file one-at-a-time, and delete them when you are done... You are much better off building a queue-based system.

Upvotes: 2

Related Questions