Reputation: 459
We are building a system that will need to serve loads of small requests from day one. By "loads" I mean ~5,000 queries per second. For each query we need to retrieve ~20 records from noSQL database. There will be two batch reads - 3-4 records at first and then 16-17 reads instantly after that (based on the result of first read). That would be ~100,000 objects to read per second.
Until now we were thinking about using DynamoDB for this as it's really easy to start with.
Storage is not something I would be worried about as the objects will be really tiny. What I am worried about is cost of reads. DynamoDB costs $0.0113 per hour per 100 eventually consistent (which is fine for us) reads per second. That is $11,3 per hour for us provided that all objects are up to 1KB in size. And that would be $5424 per month based on 16 hours/day average usage.
So... $5424 per month.
I would consider other options but I am worried about maintenance issues, costs etc. I have never worked with such setups before so your advice would be really valuable.
What would be the most cost-effective (yet still hassle-free) solution for such read/write intensive application?
Upvotes: 11
Views: 11162
Reputation: 1
Here is what I recommend in sequence.
Identify your use case and choose the correct db. We test MySQL and MongoDb regularly for all kinds of workloads (OLTP, Analytics, etc). In all cases we have tested with, MySQL outperforms MongoDb and is cheaper ($/TPS) compared to MongoDb. MongoDb has other advantages but that is another story ... since we are talking about performance here.
Try to cache your queries in RAM (by provisioning adequate RAM).
If you are bottle necked on RAM, then you can try a SSD caching solution which takes advantage of ephemeral SSD. This works if your workload is cache friendly. You can save loads of money as ephemeral SSD is typically not charged by the cloud provider.
Try PIOPS/RAID or a combination to create adequate IOPS for your application.
Upvotes: 0
Reputation: 15216
From your description above, I'm assuming that your 5,000 queries per second are entirely read operations. This is essentially what we'd call a data warehouse use case. What are your availability requirements? Does it have to be hosted on AWS and friends, or can you buy your own hardware to run in-house? What does your data look like? What does the logic which consumes this data look like?
You might get the sense that there really isn't enough information here to answer the question definitively, but I can at least offer some advice.
First, if your data is relatively small and your queries are simple, save yourself some hassle and make sure you're querying from RAM instead of disk. Any modern RDBMS with support for in-memory caching/tablespaces will do the trick. Postgres and MySQL both have features for this. In the case of Postgres make sure you've tuned the memory parameters appropriately as the out-of-the-box configuration is designed to run on pretty meager hardware. If you must use a NoSQL option, depending on the structure of your data Redis is probably a good choice (it's also primarily in-memory). However in order to say which flavor of NoSQL might be the best fit we'd need to know more about the structure of the data that you're querying, and what queries you're running.
If the queries boil down to SELECT * FROM table WHERE primary_key = {CONSTANT}
- don't bother messing with NoSQL - just use an RDBMS and learn how to tune the dang thing. This is doubly true if you can run it on your own hardware. If the connection count is high, use read slaves to balance the load.
Long-after-the-fact Edit (5/7/2013): Something I should've mentioned before: EC2 is a really really crappy place to measure performance of self-managed database nodes. Unless you're paying out the nose, your I/O perf will be terrible. Your choices are to either pay big money for provisioned IOPS, RAID together a bunch of EBS volumes, or rely on ephemeral storage whilst syncing a WAL off to S3 or similar. All of these options are expensive and difficult to maintain. All of these options have varying degrees of performance.
I discovered this for a recent project, so I switched to Rackspace. The performance increased tremendously there, but I noticed that I was paying a lot for CPU and RAM resources when really I just need fast I/O. Now I host with Digital Ocean. All of DO's storage is SSD. Their CPU performance is kind of crappy in comparison to other offerings, but I'm incredibly I/O bound so I Just Don't Care. After dropping Postgres' random_page_cost
to 2, I'm humming along quite nicely.
Moral of the story: profile, tune, repeat. Ask yourself what-if questions and constantly validate your assumptions.
Another long-after-the-fact-edit (11/23/2013): As an example of what I'm describing here, check out the following article for an example of using MySQL 5.7 with the InnoDB memcached plugin to achieve 1M QPS: http://dimitrik.free.fr/blog/archives/11-01-2013_11-30-2013.html#2013-11-22
Upvotes: 18
Reputation: 43884
By "loads" I mean ~5,000 queries per second.
Ah that's not so much, even SQL can handle that. So you are already easily within the limits of what most modern DBs can handle. However they can only handle this with the right:
That would be ~100,000 objects to read per second.
Now that's more of a high load scenario. Must you read these in such a fragmented manner? If so then (as I said) you may require to look into spreading the load across replicated shards.
Storage is not something I would be worried about as the objects will be really tiny.
Mongo is aggresive with disk allocation so even with small objects it will still pre-allocate a lot of space, this is something to bare in mind.
So... $5424 per month.
Oh yea the billing thrills of Amazon :\
.
I would consider other options but I am worried about maintenance issues, costs etc. I have never worked with such setups before so your advice would be really valuable.
Now you hit the snag of it all. You can setup your own cluster but then you might end up paying that much in money and time (or way more) for the servers, people, admins and your own mantenance time. This is one reason why DynamoDB really shines here. For large setups who are looking to take the load and pain and stress of server management (trust me it is really painful, if your a Dev you might as well change your job title to server admin from now on) off of the company.
Considering to setup this yourself you would need:
Both of which could set you back 100's of thousands of pounds a year, I would personally bet for the managed approach if it fits your needs and budget. When your need grows beyond what managed Amazon DB can give you then move to your infrastructure.
I should amend that the cost effectiveness was done with quite a few black holes for example:
Both of these contribute me to place a scenario of:
Upvotes: 4