Reputation: 1947
I have several CouchDB databases, all in the hundreds of GB, that I need to get documents from in ways that depend on multiple databases, e.g. (pseudocode, prefix indicates which database the document is from):
for each Db1_Document in Db1
if Db1_Document has field "Db2_match"
Db2_Document = Db1_Document.Db2_match
for each Db2_Reference in Db2_Document.references
if Db2_Reference has empty field "Db1_match"
add Db2_Reference to List bigList
emit [Db2_Document, bigList]
I could do this with a complicated (and hacky) set of views. Or I could bulk HTTP fetch the documents I need and do my processing in Java.
How expensive is bulk HTTP fetching in comparison with making views? Is the fact that CouchDB doesn't natively support view chaining reason enough to avoid the views solution?
This is an application where efficiency is very high priority.
Upvotes: 0
Views: 299
Reputation: 27971
You might find it easier/better to create a new DB that pulls all the information from your other DBs into another DB using filtered replication. Then do your query against that other DB. Your data will be a little stale but the advantage of have all your related data in a single DB will make it possible to write a view that has visibility of all the related documents. That view will therefore be indexed and updated incrementally as new documents arrive from the replication step.
This would provide the best of all worlds:
Upvotes: 1
Reputation: 118804
Making views is I/O and CPU intensive in Couch, especially since it affects ALL documents in the instance.
If your logic affects all documents, then making a view will likely be the most efficient mechanism for this. If you have a reasonably coarse view ALREADY that gives you the subset (or the superset of your subset, but less than the entire DB) you need for this processing, it's likely better to simply fetch the subset you need en masse and process it locally.
Upvotes: 1