paginating search results in mongoDB

Question

i am trying to paginate my search results in mongoDB below

{
"data": [
{
  "_id": "538037b869a1ca1c1ffc96e3",
  "jobs": "america movie"
},
{
  "_id": "538037a169a1ca1c1ffc96e0",
  "jobs": "superman movie"
},
{
  "_id": "538037a769a1ca1c1ffc96e1",
  "jobs": "spider man movie"
},
{
  "_id": "538037af69a1ca1c1ffc96e2",
  "jobs": "iron man movie"
},
{
  "_id": "538037c569a1ca1c1ffc96e4",
  "jobs": "social network movie"
}
],
 "Total_results": 5,
 "author": "Solomon David"
 }

which as been indexed and sorted by textScore so i implemented pagination like these below

app.get('/search/:q/limit/:lim/skip/:skip',function(req,res){

var l = parseInt(req.params.lim);
var s = parseInt(req.params.skip);
db.jobs.aggregate({$match:{$text:{$search:req.params.q}}},
{$sort:{score:{$meta:"textScore"}}},{$skip:s},{$limit:l},function(err,docs){res.send({data:docs,Total_results:docs.length,author:"Solomon David"});});

});

but when i tried like this localhost:3000/search/movie/limit/1/skip/0 i limit my result to 1 and skipped none so i have to get results like this below.

{
  "data": [
    {
      "_id": "538037b869a1ca1c1ffc96e3",
      "jobs": "america movie"
    }
]}

but i am getting like this

{
  "data": [
    {
      "_id": "538037a169a1ca1c1ffc96e0",
      "jobs": "superman movie"
    }
  ],
  "Total_results": 1,
  "author": "Solomon David"
}

please help me what am i doing wrong

Neil Lunn · Accepted Answer

There seem to be a few things to explain here so I'll try and step through them in turn. But the first thing to address is the document structure you are presenting. Arrays are not going to produce the results you want, so here is a basic collection structure, calling it "movies" for now:

{
    "_id" : "538037b869a1ca1c1ffc96e3",
    "jobs" : "america movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037a169a1ca1c1ffc96e0",
    "jobs" : "superman movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037a769a1ca1c1ffc96e1",
    "jobs" : "spider man movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037af69a1ca1c1ffc96e2",
    "jobs" : "iron man movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037c569a1ca1c1ffc96e4",
    "jobs" : "social network movie",
    "author" : "Solomon David"
}

So there are all of the items in separate documents, each with it's own details and "author" key as well. Let us now consider the basic text search statement, still using aggregation:

db.movies.aggregate([
    { "$match": {
        "$text": {
            "$search": "movie"
        }
    }},
    { "$sort": { "score": { "$meta": "textScore" } } }
])

That will search the created "text" index for the term provided and return the results ranked by "textScore" from that query. The form used here is shorthand for these stages which you might use to actually see the "score" values:

    { "$project": {
        "jobs": 1,
        "author": 1,
        "score": { "$meta": "textScore" }
    }},
    { "$sort": { "score": 1 }}

But the results produced on the sample will be this:

{
    "_id" : "538037a169a1ca1c1ffc96e0",
    "jobs" : "superman movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037b869a1ca1c1ffc96e3",
    "jobs" : "america movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037c569a1ca1c1ffc96e4",
    "jobs" : "social network movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037af69a1ca1c1ffc96e2",
    "jobs" : "iron man movie",
    "author" : "Solomon David"
}
{
    "_id" : "538037a769a1ca1c1ffc96e1",
    "jobs" : "spider man movie",
    "author" : "Solomon David"
}

Actually everything there has the same "textScore" but this is the order in which MongoDB will return them. Unless you are providing some other weighting or additional sort field then that order does not change.

That essentially covers the first part of what is meant to happen with text searches. A text search cannot modify the order or filter the contents of an array contained inside a document so this is why the documents are separated.

Paging these results is a simple process, even if $skip and $limit are not the most efficient ways to go about it, but generally you won't have much other option when using a "text search".

What you seem to be trying to achieve though is producing some "statistics" about your search within your result somehow. At any rate, storing documents with items within arrays is not the way to go about this. So the first thing to look at is a combined aggregation example:

db.movies.aggregate([
    { "$match": {
        "$text": {
            "$search": "movie"
        }
    }},
    { "$sort": { "score": { "$meta": "textScore" } } },
    { "$group": {
        "_id": null,
        "data": {  
            "$push": {
                "_id": "$_id",
                "jobs": "$jobs",
                "author": "$author"
            }
        },
        "Total_Results": { "$sum": 1 },
        "author": { 
            "$push": "$author"
         }
    }},
    { "$unwind": "$author" },
    { "$group": {
        "_id": "$author",
        "data": { "$first": "$data" },
        "Total_Results": { "$first": "$Total_Results" },
        "authorCount": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "data": { "$first": "$data" },
        "Total_Results": { "$first": "$Total_Results" },
        "Author_Info": {
            "$push": {
                "author": "$_id",
                "count": "$authorCount"
            }
        }
    }},        
    { "$unwind": "$data" },
    { "$skip": 0 },
    { "$limit": 2 },
    { "$group": {
        "_id": null,
        "data": { "$push": "$data" },
        "Total_Results": { "$first": "$Total_Results" },
        "Author_Info": { "$first": "$Author_Info" }
    }}
])

What you see here in many stages is that you are getting some "statistics" about your total search results in "Total_Results" and "Author_Info" as well as using $skip and $limit to select a "page" of two entries to return:

{
    "_id" : null,
    "data" : [
            {
                    "_id" : "538037a169a1ca1c1ffc96e0",
                    "jobs" : "superman movie",
                    "author" : "Solomon David"
            },
            {
                    "_id" : "538037b869a1ca1c1ffc96e3",
                    "jobs" : "america movie",
                    "author" : "Solomon David"
            }
    ],
    "Total_Results" : 5,
    "Author_Info" : [
            {
                    "author" : "Solomon David",
                    "count" : 5
            }
    ]
}

The problem here is that you can see this will become very unpractical when you have a large set of results. The key part here is that in order to get these "statistics", you need to use $group to $push all of the results into an array of a single document. That might be fine for a few hundred results or more, but for thousands there would be a significant performance drop, not to mention memory resource usage and the very real possibility of basically breaking the 16MB BSON limit for an individual document.

So doing everything in aggregation is not the most practical solution, and if you really need the "statistics" then your best option is to separate this into two queries. SO first the aggregate for "statistics":

db.movies.aggregate([
    { "$match": {
          "$text": {
              "$search": "movie"
          }
    }},
    { "$group": {
        "_id": "$author",
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "Total_Results": { "$sum": "$count" },
        "Author_Info": {
            "$push": {
                "author": "$_id",
                "count": "$count"
            }
        }
    }}
])

That is basically the same thing except this time we are not storing "data" with the actual search results and not worrying about paging as this is a single record of results just providing the statistics. It very quickly gets down to a single record and more or less stays there, so this is a solution that scales.

It should also be apparent that you would not need to do this for every "page" and only need to run this with the initial query. The "statistics" can be easily cached so you can just retrieve that data with each "page" request.

All that is to do now is simply run the query per page of results desired without that "statistics", and this can be done simply using the .find() form:

db.movies.find(
    { "$text": { "$search": "movie" } },
    { "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" } }).skip(0).limit(2)

The short lesson here is that is you want "statistics" from your search, do that in a separate step to the actual paging of results. That is pretty common practice for general database paging in as simple as a "statistic" for "Total Results".

Beyond that, other options are to look at full text search solutions external to MongoDB. These are more feature laden than the "toe in the water" implementation that MongoDB offers out of the box and will also likely offer better performance solutions for "paging" large sets of results over that $skip and $limit can offer.

paginating search results in mongoDB

Answers (1)

Related Questions