Reputation: 155

Mongodb: Query on the last N documents(some portion) of a collection only

In my collection that has say 100 documents, I want to run the following query:

collection.find({"$text" : {"$search" : "some_string"})

Assume that a suitable "text" index already exists and thus my question is : How can I run this query on the last 'n' documents only?

All the question that I found on the web ask how to get the last n docs. Whereas My question is how to search on the last n docs only?

More generally my question is How can I run a mongo query on some portion say 20% of a collection.

What I tried

Im using pymongo so I tried to use skip() and limit() to get the last n documents but I didn't find a way to perform a query on cursor that the above mentioned function return.

After @hhsarh's anwser here's what I tried to no avail

# here's what I tried after initial answers
recents = information_collection.aggregate([
                {"$match" : {"$text" : {"$search" : "healthline"}}},
                {"$sort" : {"_id" : -1}},
                {"$limit" : 1},
            ])

The result is still coming from the whole collection instead of just the last record/document as the above code attempts.

The last document doesn't contain "healthline" in any field therefore the intended result of the query should be empty []. But I get a documents.

Please can someone tell how this can be possible

Upvotes: 2

Answers (2)

turivishal

Reputation: 36104

It is not possible with $text operator, because there is a restriction,

The $match stage that includes a $text must be the first stage in the pipeline

It means we can't limit documents before $text operator, read more about $text operator restriction.

Second option this might possible if you use $regex regular expression operator instead of $text operator for searching,

And if you need to search same like $text operator you have modify your search input as below:

lets assume searchInput is your input variable
list of search field in searchFields
slice that search input string by space and loop that words array and convert it to regular expression
loop that search fields searchFields and prepare $in condition

searchInput = "This is search"
searchFields = ["field1", "field2"]
searchRegex = []
searchPayload = []

for s in searchInput.split(): searchRegex.append(re.compile(s, re.IGNORECASE));
for f in searchFields: searchPayload.append({ f: { "$in": searchRegex } })
print(searchPayload)

Now your input would look like,

[
  {'field1': {'$in': [/This/i, /is/i, /search/i]}}, 
  {'field2': {'$in': [/This/i, /is/i, /search/i]}}
]

Use that variable searchPayload with $or operator in search query at last stage using $in operator,

recents = information_collection.aggregate([
  # 1 = ascending, -1 descending you can use anyone as per your requirement
  { "$sort": { "_id": 1 } }, 
  # use any limit of number as per your requirement
  { "$limit": 10 }, 
  { "$match": { "$or": searchPayload } }
])

print(list(recents))

Note: The $regex regular expression search will cause performance issues.

To improve search performance you can create a compound index on your search fields like,

information_collection.createIndex({ field1: 1, field2: 1 });

Upvotes: 2

hhharsha36

Reputation: 3349

What you are looking for can be achieved using MongoDB Aggregation

Note: As pointed out by @turivishal, $text won't work if it is not in the first stage of the aggregation pipeline.

collection.aggregate([
  {
    "$sort": {
      "_id": -1
    }
  },
  {
    "$limit": 10  // `n` value, where n is the number of last records you want to consider
  },
  {
    "$match" : {
      // All your find query goes here
    }
  },
], {allowDiskUse=true})  // just in case if the computation exceeds 100MB

Since _id is indexed by default, the above aggregation query should be faster. But, its performance reduces in proportion to the n value.

Note: Replace the last line in the code example with the below line if you are using pymongo

], allowDiskUse=True)

Upvotes: 2

Mongodb: Query on the last N documents(some portion) of a collection only

Answers (2)

Related Questions