brandones
brandones

Reputation: 1947

CouchDB performance of views vs HTTP bulk fetch

I have several CouchDB databases, all in the hundreds of GB, that I need to get documents from in ways that depend on multiple databases, e.g. (pseudocode, prefix indicates which database the document is from):

for each Db1_Document in Db1
    if Db1_Document has field "Db2_match"
        Db2_Document = Db1_Document.Db2_match
        for each Db2_Reference in Db2_Document.references
            if Db2_Reference has empty field "Db1_match"
                add Db2_Reference to List bigList
        emit [Db2_Document, bigList]

I could do this with a complicated (and hacky) set of views. Or I could bulk HTTP fetch the documents I need and do my processing in Java.

How expensive is bulk HTTP fetching in comparison with making views? Is the fact that CouchDB doesn't natively support view chaining reason enough to avoid the views solution?

This is an application where efficiency is very high priority.

Upvotes: 0

Views: 299

Answers (2)

smathy
smathy

Reputation: 27971

You might find it easier/better to create a new DB that pulls all the information from your other DBs into another DB using filtered replication. Then do your query against that other DB. Your data will be a little stale but the advantage of have all your related data in a single DB will make it possible to write a view that has visibility of all the related documents. That view will therefore be indexed and updated incrementally as new documents arrive from the replication step.

This would provide the best of all worlds:

  • You will be able to write views that make sense.
  • You'll be pulling only the data you need into your Java code.
  • You'll still get the performance benefit at runtime because the view will be indexed which will be updated incrementally as new data arrives from the replication.

Upvotes: 1

Will Hartung
Will Hartung

Reputation: 118804

Making views is I/O and CPU intensive in Couch, especially since it affects ALL documents in the instance.

If your logic affects all documents, then making a view will likely be the most efficient mechanism for this. If you have a reasonably coarse view ALREADY that gives you the subset (or the superset of your subset, but less than the entire DB) you need for this processing, it's likely better to simply fetch the subset you need en masse and process it locally.

Upvotes: 1

Related Questions