Will dynamic reduce function result in re-constructing B-Tree

Question

I am a little confused about how CouchDB operates. So far I have learned that requesting a map function of a view for the first time will cause couchdb to construct a b-tree index that gets referenced in the following runs.

What I am unsure about is, does the b-tree gets re-constructed if my reduce function returns different document every time the view is requested?

Thank you

Marek Kowalski · Accepted Answer

The reduce() function doesn't return documents, it returns the reduced value for the given set of values. The schema of map-reduce goes as follows. The map() function is called for each of the documents in the database. From map() you can emit() any number of the view rows. You do it with a code like:

emit(key, value);

Its important to note that map() is called only once for each document revision, later on this result is cached. The result can only depend on the document, you cannot pass any parameters from the request or emit random numbers.

Than, when you query your view, and you have the reduce() function defined, it will be called for all the rows emitted for the documents matching the query key range. Again, there is no way to pass arguments from the request. The result can only depend on the values passed to reduce() function.

Internally the b-tree structure is used to cache and minimize the amount of computation needed. You can have multiple views defined with the same map() function code and different reduce() functions. CouchDB is smart enough to shard the output of map() and not call it more times than needed.

I hope this clears out a little. Good luck!

-- edit below to answer comment about selecting random row in reduce() --

In general using random in map() or reduce() is against the design of map-reduce procedure. If you use randomness inside map() or reduce() function, the random result will get cached. The result of reduce() is cached for different subsets of the rows. You don't know or control how many times the reduce() function is called to calculated the final result of the query. If you perform the exact same query twice, the second time might not even need 1 call.

Also the design on reduce() function requires that the following relation is satisfied:

reduce(reduce(a) + reduce(b)) = reduce(a + b)

a and b above are the key-value ranges. This obviously doesn't stand if you use random inside the reduce body.

As far as I understand you just want to get the random row from the view. Why don't you just use a reduce="_count" in your view. Than you can perform your task in two queries:

first query to get the count of rows
second query with reduce=false&skip=random(count)&limit=1 to get the random row of the view. Here the random(count) is calculated on the client side.

Will dynamic reduce function result in re-constructing B-Tree

Answers (1)

Related Questions