Would Amazon S3 work as storage for sensor data streaming in from millions of endpoints?

Question

I am looking for options for reliable (and speedy) storage for small amounts of sensor data that would be coming in from (getting optimistic here) millions of endpoints. The scale I'm talking is 1M endpoints, each sending 100 bytes every minute. This data needs to be available for analysis shortly after this. Additionally, this data will be kept for a few years and may exceed 100TB of total storage.

Is S3 the solution to this, or would I be better off hosting my own NoSQL cluster like Cassandra/MongoDB etc?

Please let me know if I have not specified any information.

Uriah Carpenter · Accepted Answer

Yes, you could. But, there are no query mechanisms nor any method of reading multiple objects in one request in S3. You would also not have any mechanism to inspect the data before it's written.

This might be a better idea:

Have clients write sensor data onto a SQS queue
Your application reads messages off the queue and writes the data into SimpleDB or other data store

It would de-couple receipt of the data, with any data load/storage phase.

Note that many Amazon services have a per-request charge. For SQS it's $0.01/10000 requests. If you want to have 1 million clients write one message each minute request charges alone would be over $40,000 a month. Doubling when taking reading the messages into account.

(((1000000 * (60*24*30)) / 10000) * $0.01) * 2 = $86,400

For S3, it's $0.01/1000 for POSTs (client writes), and $0.01/10000 GETs (reads). For 1 million clients your per-request charges alone could easily reach $500,000 per month.

Ultimately, at 1 million clients, you likely need to run your own receiving endpoints simply due to economic factors.

Would Amazon S3 work as storage for sensor data streaming in from millions of endpoints?

Answers (1)

Related Questions