Blue Frauenglass
Blue Frauenglass

Reputation: 43

Chained map/reduce in couchDB

In couchDB, I have a set of items like the following (simplified for example's sake):

{_id: 1, date: "Jul 1", user: "user1"}
{_id: 2, date: "Jul 2", user: "user1"}
{_id: 3, date: "Jul 3", user: "user2"}
...etc...

I'd like to get a list of "most recent activity", sorted by date, with no duplicate user _ids. I can create a view with results like so:

{key: "July 3", _id: 3, user: "user2"}
{key: "July 2", _id: 2, user: "user1"}
{key: "July 1", _id: 1, user: "user1"}

but this contains duplicate entries for the same user. Or I can create a view that maps {key: user, value: date} and reduces to

{key: "user1", mostRecentDate: "July 2"}
{key: "user2", mostRecentDate: "July 3"}

but that isn't sorted by "most recent".

I know that the obvious solution - reducing over the results of another view isn't supported. BigCouch supports chained map/reduce, but appears to be rather out of date / unsupported (last release 2012).

This seems like a rather common problem - what are some existing solutions (beyond "switch databases")?

Upvotes: 4

Views: 1131

Answers (1)

Akshat Jiwan Sharma
Akshat Jiwan Sharma

Reputation: 16000

Here is a general idea of how you can do chained map reduce with couchdb 1.xxx. What we want is the ability to pass the the results of one map/reduce to another.

  1. Subscribe to the _changes feed filtered by the view. This will give you a list of docs that will actually be emitted by the map function.

  2. Next we need to call the view function for these filtered docs. It's simple since we can pass a list of keys to the view so we simply pass the keys and get the desired result subset of the view.

  3. Next we push this result either in a separate database or in the same one. We can use bulk inserts to perform the inserts faster. If you use a separate database you can even reuse the _id's from the view results so the bulk updates would be a lot easier.

  4. Within this database we define another view that sorts our results based on value.

    {key: "user1", mostRecentDate: "July 2"} {key: "user2", mostRecentDate: "July 3"}

since you have already gotten to this step all you need to do is create a view on mostRecentDate in the second database and you will get user activity sorted by date.

I hope you are using a dummy reduce though. One that returns null and is only used for group=true.

using a list function in step 4 can make your life easier. As bulk updates require the list of docs to be in the form {"docs":[....]} you can easily get it in one go with a list function.

Upvotes: 1

Related Questions