couchDB: checkpoint_commit_failure

Question

I have a couchDB server running on an amazon ec2 instance. It's the stock 1.2.0, from an rpm.

I also have several android devices running couchbase-mobile-2.0.

These devices initiate a continuous push and pull replication from the server. All of these devices should be eventually consistent.

However, when one of the mobile devices pushes a document, when the other mobile device attempts to pull this document, I get the following error on the devices log:

 E/CouchDB(9896): [error] [<0.199.0>] Replication `bf69ede4416770a1fef28ffb4c4e6950+continuous` (`treatment` -> `http://portecTest:*****@50.150.250.165:5984/treatment/`) failed: {checkpoint_commit_failure,<<"Error updating the target checkpoint document: conflict">>}

The app is designed is such a way that this document will not be edited by the other devices or the server, so it's not a revision conflict.

After this, no more documents will replicate, push or pull, untill i restart the app. (the continuous replication is initialized when the app starts). After restart, it works.

What does this mean? Any ideas what could be causing it?

Tim Perry · Accepted Answer

I've seen this before because of conflicting replication ids.

When you set up a replication in the _replicator database a _replication_id field is added to the document (e.g. bf69ede4416770a1fef28ffb4c4e6950+continuous, for your current replication) . This is used by the replication process itself to track the process of a replication by managing a document at http://server:5984/dbname/_local/replication-id on both ends of the replication, and recording things there like the last seen sequence number.

This is also used to spot if you have two replications set up to do the same thing, as the id is generated purely from the parameters, so two replications with the same target and source, and no other options, will have the same id.

I'm not totally sure how the replication_id is generated (maybe there's a seed of some sort somewhere?) but I've definitely had problems before where it was being generated identically on every machine involved (which all had the same replication docs like {source:localDb, target:remoteServer:5984/remoteDb}), so they all then tried to use the same db/_local/id document, and then threw huge numbers of conflicts because they were all changing the same document on remote server simultaneously.

You can check if this is the problem by comparing the _replication_id fields on both mobile devices and seeing if they're the same.

I fixed this by including the local machine's address in the source, so that every id was different, and the conflicts went away. That might not be practical for you with mobile devices though, unless you can get a consistently working local address on each mobile (some generated hostname?). I couldn't find any way to manually set the replication_id.

If this is the problem, the solution is basically to make each replication document unique from the others in some way. Could you name the databases different things on each mobile device somehow?

couchDB: checkpoint_commit_failure

Answers (1)

Related Questions