Thomas
Thomas

Reputation: 8950

do we need 3*N instances on amazon ec2 to host N mongodb shards?

The question might seem ridiculous but it seems to me that a "yes" would be a little crazy.
MongoDB suggests to have replication sets of 3 machines. So if the database can stand on 1 computer, I need 3 machines, and if tomorrow I need to shard and need 2 machines I will actually need 6, right ?
Or is there something smarter that can be done and that comes for free with mongoDB ? (with coding theory like Hamming, ... the number of extra bits that we need is not linear in the size of the total number of bits)
Please don't hesitate to ask me to reformulate if what I say is not clear
Thanks in advance for your answers,
Thomas

Upvotes: 0

Views: 94

Answers (1)

attish
attish

Reputation: 3150

So there is some really good documentation which is the recommended cluster setup in terms of phisycal instance separation. There should be considered two things (at least) separately. One is replication and for this one see this documentation : http://docs.mongodb.org/manual/core/replica-set-members/

Which means you have to have at least two data nodes (due to HA) in a replicaset and can have one arbiter which is not holding data just participate in election as it is described in the docs linked above. You need an odd number of setmembers due to the primary has to be elected by a majority inside the replicaset.

The other aspect is sharding. Sharding needs some additional metadata maintaining layer which is achived through additional processes these are configuration servers and mongos routers. For sharded production cluster see : http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/. In this setup the three configservers have to be on separated instances. Also the two mongos processes cannot reside on the same instance.

So for the minimal alignment. Have to be considered :

  • You must not collocate data nodes (each two datanodes in each shard have to be on a separated instance)
  • The arbiter node belonging to a specific shards replicaset have to be on a separated instance from the two datanodes
  • The three configservers should reside on separated instances from each other
  • The minimal two mongos processes have to reside on separated nodes from each other
  • However datanodes cannot be collocated, configservers and mongos processes can be on the same instances as the datanodes.

So theoretically one can align a sharded cluster without braking any of the recomendations on 4 instances with two shards like this:


Instance 1: datanode replicaset 1, configserver 1, arbiter replicaset 2


Instance 2: datanode replicaset 1, configserver 2, mongos 1


Instance 3: datanode replicaset 2, configserver 3, arbiter replicaset 1


Instance 4: datanode replicaset 2, mongos 2

Where replicaset 1 represents the first shard and replicaset 2 represents the second.

datanode is not a terminology which is used for mongoDB in general just i am likely to address with this name those mongod process which are handling real data, so the (Primaries and secondaries in a replicaset). Just as a sidenote i would not do this. Just start micro instances for the configservers and keep mongos processes on the application servers.

Upvotes: 2

Related Questions