jjmerelo
jjmerelo

Reputation: 23527

Using CouchDB for dataflow, or how to retrieve documents with a specific revision

Imagine you want to use revision number as a state flag for documents in a database. Revision 1 is for "raw" documents, revision 2 for a certain "processed" state, and so on an so forth. What you want, then, is to retrieve only documents with revision 1 so that they can be "processed" and taken to revision 2. There's an obvious way, to create a view that extracts the revision number from the _rev field in the document, something like

function(doc) {
  var rev = doc._rev.split("-");
  emit( rev[0], doc);
}

However, this implies using a view, and being _rev a builtin, is there not a straightforward way of retrieving documents in bulk using _all_docs?

Upvotes: 1

Views: 169

Answers (1)

Andreas Klöber
Andreas Klöber

Reputation: 5920

I would recommend not to abuse the CouchDB revisions for that purpose. Some points:

  • You can not manually set a specific revision, so if you want to store a document directly in stage 3 you have to add it and apply 2 NOP puts. Furthermore you can not "downgrade" the revison. To achive this you have to delete the document, add it again and push it with NOPs to the desired revision. All this is really inflexible.
  • There is no build-in support for retrieving documents with specific revision "prefixes".
  • If you want to modify your processing workflow with an intermediate stage you have to increase the revisions of the documents in this stage and all subsequent stages.

I would recommend to add a specific attribute that denotes the document's processing stage. Another approach might be to create dedicated processing databases for each stage. So requesting all documents in a specific stage can be done via _all_docs of the corresponding database. Depending on your use case you can delete the documents in the previous database once they have been added to next processing database.

Upvotes: 1

Related Questions