Reputation: 3366
consider the following document structures:
Thread:
- doc_type 1
- _id
- subject (string)
Posts:
- doc_type 2
- _id
- thread_id (_id of Thread)
- time (milliseconds since 1970)
- comment (string)
I need the threads sorted by the last post on a thread, together with latest 5 posts. I thought to avoid updating the thread document every time a new post is done in order to eliminate probability of conflicts in a distributed environment across db nodes. Besides, it will be working for the DB where the DB should be working for you.
For simplicity - lets' just start with finding the latest post. The 5 posts can be gathered the same way.
Now, I'm not sure I'm on the right direction, however, looking here I found how to find the last post in a thread using a reduce function that uses a group-level to return thread subject taken from doc-type 1, and the last post document taken from doc-type 2.
BTW - opposed to the sample in the link, in my case a thread is always created with a first post, (so, for example, the creation date of a Thread will be the date of it's first Post).
map:
function(doc){
switch(doc.doc_type){
case 1: emit([doc._id],doc); return;
case 2: emit([doc.thread_id],doc); return;
}
}
reduce: on real world keys are more compound, so it must be used with appropriate group-level. I also ignore here the case of re-reduce, just for simplicity's sake. You can find full picture here:
function(keys, vals, rr){
var result = { subject: null, lastPost: null, count :0 };
//I'll ignore the re-reduce case for simplicity
vals.forEach(function(doc){
switch(doc.doc_type){
case 1:
result.subject = doc.subject;
return;
case 2:
if (result.lastPost.time < doc.time) result.lastPost = doc;
result.count++;
return;
}
});
return result;
}
But how do I page it afterwards sorted by the latest-post date? Is there a way to feed doc-ids from a result of a query as the filter criteria of another (preferably, using one round-trip)?
There is no limit to the number of posts in a thread, so I'm a little reluctant to relay on list function here, when the page-size can also vary, what will result in the last post not showing at all.
Upvotes: 1
Views: 534
Reputation: 2365
If you're only after the last post or the last five posts, there's a much simpler method. You can completely avoid the reducer, in fact.
If you add the time as the second portion of the key, you can use a combination of endkey, descending, and limit to get the last N posts based on the thread_id.
Here's the MapReduce I wrote with some test data based on your schemas:
function(doc) {
if (doc.type) {
if (doc.subject) {
emit([doc._id, doc.time], doc.subject);
emit([doc._id, 'Z'], doc.subject);
} else {
emit([doc.thread_id, doc.time], {_id: doc._id});
}
}
}
The strange output of the 'Z' key is to allow you to get the subject from the "bottom" of the list of items.
The query parameters would look something like:
?endkey=["thread_id"]&descending=true&limit=6
The limit should be N+1 where N is the number of posts you'd like back. In the results you'll have the thread subject and _id objects (or whatever you'd like) from the post documents.
The _id objects are output in this example so you can use it with include_docs=true
if you want the full post. Toss in whatever other data from the post document you want (title, etc) to keep the overall index size low and use include_docs in those places where you need the full contents of the document. However, if you always need the full post document, output it in the emit as that will give you a faster response (though a larger index size on disk).
Also, if you need a list of all threads sorted by last post as well as 5 posts per thread, you'd need to output keys like [time, thread_id, 'thread']
and [time, thread_id, 'post']
and use a _list
to collect the posts "under" each thread document as the time sorting will cause threads and posts to be farther apart in the results. A _list
function can then be used to combine/find them again. However, doing two requests may still be easier/lighter.
Upvotes: 1