CouchDB performance of views vs HTTP bulk fetch

Question

I have several CouchDB databases, all in the hundreds of GB, that I need to get documents from in ways that depend on multiple databases, e.g. (pseudocode, prefix indicates which database the document is from):

for each Db1_Document in Db1
    if Db1_Document has field "Db2_match"
        Db2_Document = Db1_Document.Db2_match
        for each Db2_Reference in Db2_Document.references
            if Db2_Reference has empty field "Db1_match"
                add Db2_Reference to List bigList
        emit [Db2_Document, bigList]

I could do this with a complicated (and hacky) set of views. Or I could bulk HTTP fetch the documents I need and do my processing in Java.

How expensive is bulk HTTP fetching in comparison with making views? Is the fact that CouchDB doesn't natively support view chaining reason enough to avoid the views solution?

This is an application where efficiency is very high priority.

Will Hartung · Accepted Answer

Making views is I/O and CPU intensive in Couch, especially since it affects ALL documents in the instance.

If your logic affects all documents, then making a view will likely be the most efficient mechanism for this. If you have a reasonably coarse view ALREADY that gives you the subset (or the superset of your subset, but less than the entire DB) you need for this processing, it's likely better to simply fetch the subset you need en masse and process it locally.

CouchDB performance of views vs HTTP bulk fetch

Answers (2)

Related Questions