Reputation: 24888
I have a capture tool which captures approximately 1000 50-byte data records per second, storing them effectively into a MongoDB collection in real time within a 1 gigabit ethernet network.
I want to take the server out onto the internet, that is, the capture source and the Mongod database will no longer be on the same LAN. Although the throughput is still likely to be sufficient (I have 100Mbps bidirectional service between the two points, and capture rate is 1000 * 50 * 8 = 400kbps so several orders of magnitude of headroom here even assuming large amounts of overhead), latency is likely to be a problem.
Can I tune Mongo so that it does not confirm every write for a few seconds, thereby overcoming any latency issues? Does Mongo confirm every write? My tools are all written in Python using Pymongo which triggers an atomic write every time a data point event occurs.
Would I have to batch them up manually?
Upvotes: 0
Views: 922
Reputation: 62648
Mongo has the concept of write concerns, which basically let you specify how important a write is, which may get you speed in exchange for potential data loss.
An write with the "unacknowledged" write concern will not wait for the primary to confirm the write at all - it basically shoves the data into the write socket and goes on its way. This is very fast, but means that data could potentially get lost if the socket is closed before the data is sent, or the primary steps down before the write is processed. It is UDP-ish in this case (though it's still TCP).
The "acknowledged" and "journaled" write concerns will cause the mongo driver to block until the server has received the write and acknowledged it (and in the case of journaled, once it has been written to the on-disk journal). This is much safer (but also slower) than unacknowledged writes, but there is still the potential for data loss if the primary were to step down before the op is replicated to the secondaries.
The "majority" write concern will cause the mongo driver to block until the server has acknowledged the write, and a majority of nodes in the replica set have acknowledged the write. This is the slowest write mode, but is the most durable, and alleviates many eventual consistency concerns.
Why are you moving the server out of the LAN? If you just need to have it externally accessible, you could set up a replica set and replicate out of the LAN with it. That way, your writes could still be to a local primary with the "journaled" write concern and replicated out to the secondary without worrying too much about latency stuffing up the system.
Upvotes: 2