Reputation: 8882
we're planing to deploy a web-application with Amazon OpsWork and I just wanted to check with you, if our architecture might have any design flaws.
We've 4 components:
Here's a communication diagram of our components:
At the front is a load balancer which distributes http requests to multiple web servers.
The web server is stateless and therefore can be cloned each time the load requires it. All web server instances are equal. Session information is saved in the MongoDB.
In the "backend" we're planing to use the build-in cluster functionalities from MongoDB and ElasticSearch. Therefore each web server instance only connects to a single MongoDB and ElasticSearch master instance. MongoDB and ElasticSearch are then scaling accordingly. Furthermore the the ElasticSearch master speaks to the MongoDB master to retrieve data for building the index.
How we see it, the most challenging task to setup such a system, is to configure OpsWorks with the MongoDB and ElasticSearch cluster.
Many thanks in advance!
Upvotes: 3
Views: 3834
Reputation: 13065
if our architecture might have any design flaws.
Well, keep in mind that we can't tell much from a generic diagram. But here are some notes:
1) MongoDB isn't as easy to scale as other databases such as DynamoDB, Riak or Cassandra. For example, if you ever exceed the capacity of a single master (no matter how many slaves you have, all writes go to the single master), you'll have to shard. But switching to sharding is very disruptive and very tedious to set up.
If you don't expect to exceed the write capacity of one node, then you'll be fine on MongoDB.
2) What will you do for async tasks such as sending emails, creating long reports, etc?
It's possible to do these things in the request loop, and that's probably a fine way to get started. But as you have more boxes, the chances of failure go up. When a box dies, all the async tasks go away and nobody will know what they were. You also can have problems where one box gets heavily loaded with async tasks (using too much CPU or memory), and the problem will get worse and worse as it gets more tasks and completes them more slowly.
Also, a front-end like ELB will have a 60-second limit, which can cause problems if some of your requests could take longer. (Spin them off into async jobs with polling or something.)
3) ELB doesn't support web sockets. Consider that if you think you might want websockets down the road.
Upvotes: 3
Reputation: 8284
There's no such thing as a master in elastic search. You have master copies of shards and replicas of shards but they are basically moved around through your cluster by elastic search. Nodes might be master for one shard and a replica for another. So, you could simply put a load balancer in front of it.
However, you can specialize nodes to be data nodes or routing nodes as explained here: http://www.elasticsearch.org/guide/reference/modules/node/
The routing nodes effectively become load balancers. You could have a few of those (redundancy) and distribute load between those. Alternatively, you could run a dedicated router node on each web server. Basically routing nodes are pretty light and you save a bit of bandwidth/latency since your web server talks to localhost and from there it is all elastic search internal cluster traffic.
Upvotes: 2
Reputation: 105053
I'd recommend to replace MongoDB with Amazon Dynamo DB (it has node.js SDK).
Upvotes: 1