Reputation: 1640
Removing hosts from a mongoDB replica set, but not changing the hosts string passed to mongo_client (or MongoReplicaSetClient) using that replica set, seems to break the pymongo connection when restarting the service. This exception raised is:
pymongo.errors.ServerSelectionTimeoutError: host4:27017: [Errno -2] Name or service not known...
The problem can be distilled to as follows:
hosts1 = "host1, host2, host3, host4" # where host1 and host2 are not available anymore
hosts2 = "host3, host4" # only has valid hosts
hosts3 = ["host1", "host2", "host3", "host4"] # expressed as a list
client = MongoClient(hosts1, 27017, replicaset="rs0")
db = client['admin']
db.authenticate('user', 'pass')
So the script will fail with hosts1, but works with host2 and host3, ie.
client = MongoClient(hosts2, 27017, replicaset="rs0") # works
or:
client = MongoClient(hosts3, 27017, replicaset="rs0") # works
The problem with this, is that this problem doesn't become apparent until the service is restarted, which might happen a lot later after the replica set membership was changed.
The fact that it works with hosts2 suggests that string of hosts format used is a valid one. So why does the first one fail when one restarts the service?
Upvotes: 1
Views: 1425
Reputation: 1640
The answer can be found here in the pymongo connection split_hosts procedure in the parser.
The parser doesn't ignore spacing, even though the URI spec (RFC2396) specifies that spaces should be excluded and that spaces can be used as a delimeter (section 2.3.4). The inclusion of spaces in the host names causes the networking error.
The reason why the host2 string works is the first host in the list is still valid, as it doesn't start with a space and properly resolves. The other two are wrong, but the pymongo driver needs only one to find one functioning host, which then it can use the find all the others.
Thus, the answer to the problem is to remove the spaces after the comma.
hosts1 = "host1,host2,host3,host4"
The fix is simple, however, the biggest problem with this situation is that the problem does not become apparent until the service is restarted, which might be a long time after the membership of the replica set has changed.
Upvotes: 2