Mongoose query returning repeated results

Question

The query receives a pair of coordinates, a maximum Distance radius, a "skip" integer and a "limit" integer. The function should return the closest and newest locations according to the position given. There is no visible error in my code, however, when I call the query again, it returns repeated results. "skip" variable is updated according to the results returned.

Example:

1) I make query with skip = 0, limit = 10. I receive 10 non-repeated locations.

2) Query is called again now, skip = 10, limit = 10. I receive another 10 locations with repeated results from the first query.

QUERY

Locations.find({ coordinates :
                 { $near : [ x , y ],
                   $maxDistance: maxDistance }
            })
.sort('date_created')
.skip(skip)
.limit(limit)
.exec(function(err, locations) {
    console.log("[+]Found Locations");
    callback(locations);
});

SCHEMA

var locationSchema = new Schema({
        date_created: { type: Date },
        coordinates: [],
        text: { type: String }
});

I have tried looking everywhere for a solution. My only option would be versions of Mongo? I use mongoose 4.x.x and mongodb is like 2.5.6. I believe. Any ideas?

Blakes Seven · Accepted Answer

There are a couple of things to consider here in the sort of results that you want, with the first consideration being that you have a "secondary" sort criteria in the "date_created" to deal with.

The basic problem there is that the $near operator and like operators in MongoDB do not at present "project" any field to indicate the "distance" from the queried location, and simply just "default sort" the data. So in order to do that "secondary" sort, a field with the "distance" needs to be present. There are therefore other options for this.

The second case is that "skip" and "limit" style paging is horrible form performance on large sets of data and should be avoided where you can. So it's better to select data based on a "range" where it occurs rather than "skip" through all the results you have previously displayed.

The first thing to do here is use a command that can "project" the distance into the document along with the other information. The aggregation command of $geoNear is good for this, and especially since we want to do other sorting:

var seenIds = [],
    lastDistance = null,
    lastDate = null;

Locations.aggregate(
    [
        { "$geoNear": {
            "near": [x,y],
            "maxDistance": maxDistance
            "distanceField": "dist",
            "limit": 10
        }},
        { "$sort": { "dist": 1, "date_created": -1 }
    ],
    function(err,results) {
        results.forEach(function(result) {

            if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
                seenIds = [];
                lastDistance = result.dist;
                lastDate = result.date_created;
           }
           seenIds.push(result._id);
       });
       // save those variables to session or other persistence
       // do something with results
    }
)

That is the first iteration of your results where you fetch the first 10. Noting the logic inside the loop, where each document in the results is inspected for either a change in the "date_created" or the projected "dist" field now present in the document and where this occurs the "seenIds" array is wiped of all current entries. The general action is that all the variables are tested and possibly updated on each iteration and where there is no change then items are added to the list of "seenIds".

All those three variables being worked on need to be stored somewhere awaiting the next request. For web applications the session store is ideal, but different approaches vary. You just want those values to be recalled when we start the next request, as on the next and subsequent iterations we alter the query a bit:

Locations.aggregate(
    [
        { "$geoNear": {
            "near": [x,y],
            "maxDistance": maxDistance,
            "minDistance": lastDistance,
            "distanceField": "dist",
            "limit": 10,
            "query": {
                "_id": { "$nin": seenIds },
                "date_created": { "$lt": lastDate }
            }
        }},
        { "$sort": { "dist": 1, "date_created": -1 }
    ],
    function(err,results) {
        results.forEach(function(result) {
            if ( ( result.dist != lastDistance ) || ( result.date_created != lastDate ) ) {
                seenIds = [];
                lastDistance = result.dist;
                lastDate = result.date_created;
           }
           seenIds.push(result._id);
       });
       // save those variables to session or other persistence
       // do something with results
    }
)

So there the "minDistance" parameter is entered as you want to exclude any of the "nearer" results that have already been seen, and the additional checks are placed in the query with the "date_created" needing to be "less than" the "lastDistance" recorded as well since we are in descending order of sort, with the final "sure" filter in excluding any "_id" values that were recorded within the list because the values had not changed.

Now with geospatial data that "seenIds" list is not likely to grow as generally you are not going to find things all at the same distance, but it is a general process of paging a sorted list of data like this, so it is worth understanding the concept.

So if you want to be able to use a secondary field to sort on with geospatial data and also considering the "near" distance then this is the general approach, by projecting a distance value into the document results as well as storing the last seen values before any changes that would not make them unique.

The general concept is "advancing the minimum distance" to enable each page of results to get gradually "further away" from the source point of origin used in the query.

Mongoose query returning repeated results

Answers (1)

Related Questions