Using AWS S3 as an intermediate storage layer for monitoring platform

Question

We have a use case where we want to use S3 to push event based + product metrics temporarily until they are loaded in a relational data warehouse (Oracle). These metrics would be sent by more than 200 application servers to S3 and persisted in different files per metric per server. The frequency of some of the metrics could be high for e.g. sending number of active http sessions on the app server every minute or the memory usage per minute. Once the metrics are persisted in S3, we would have something on the data warehouse that would read the csv file and load them in Oracle. We thought of using S3 over a queue (kafka/activemq/rabbit mq) due to various factors including cost, durability and replication. I have a few questions related to the write and read mechanisms with S3

For event based metrics, how can we write to S3 such that the app server is not blocked? I see that the java sdk does support asynchronous writes. Would that guarantee deliveries?
How can we update a csv file created on S3 by appending a record? From what I have read we cannot update an S3 object. What would be an efficient way for pushing monitoring metrics to S3 at periodic intervals?
When reading from S3, performance isn't a critical requirement. What would be an optimized way of loading the csv files into Oracle? A couple of ways included using the get object api from java sdk or mount S3 folders as NFS shares and creating external tables. Are there any other efficient ways of reading?

Thanks

Using AWS S3 as an intermediate storage layer for monitoring platform

Answers (1)

Related Questions