Reputation: 155
In my collection that has say 100 documents, I want to run the following query:
collection.find({"$text" : {"$search" : "some_string"})
Assume that a suitable "text" index already exists and thus my question is : How can I run this query on the last 'n' documents only?
All the question that I found on the web ask how to get the last n docs. Whereas My question is how to search on the last n docs only?
More generally my question is How can I run a mongo query on some portion say 20% of a collection.
What I tried
Im using pymongo so I tried to use skip() and limit() to get the last n documents but I didn't find a way to perform a query on cursor that the above mentioned function return.
After @hhsarh's anwser here's what I tried to no avail
# here's what I tried after initial answers
recents = information_collection.aggregate([
{"$match" : {"$text" : {"$search" : "healthline"}}},
{"$sort" : {"_id" : -1}},
{"$limit" : 1},
])
The result is still coming from the whole collection instead of just the last record/document as the above code attempts.
The last document doesn't contain "healthline" in any field therefore the intended result of the query should be empty []. But I get a documents.
Please can someone tell how this can be possible
Upvotes: 2
Views: 565
Reputation: 36104
It is not possible with $text operator, because there is a restriction,
The
$match
stage that includes a$text
must be the first stage in the pipeline
It means we can't limit documents before $text
operator, read more about $text operator restriction.
Second option this might possible if you use $regex regular expression operator instead of $text
operator for searching,
And if you need to search same like $text
operator you have modify your search input as below:
searchInput
is your input variablesearchFields
searchFields
and prepare $in
conditionsearchInput = "This is search"
searchFields = ["field1", "field2"]
searchRegex = []
searchPayload = []
for s in searchInput.split(): searchRegex.append(re.compile(s, re.IGNORECASE));
for f in searchFields: searchPayload.append({ f: { "$in": searchRegex } })
print(searchPayload)
Now your input would look like,
[
{'field1': {'$in': [/This/i, /is/i, /search/i]}},
{'field2': {'$in': [/This/i, /is/i, /search/i]}}
]
Use that variable searchPayload
with $or
operator in search query at last stage using $in
operator,
recents = information_collection.aggregate([
# 1 = ascending, -1 descending you can use anyone as per your requirement
{ "$sort": { "_id": 1 } },
# use any limit of number as per your requirement
{ "$limit": 10 },
{ "$match": { "$or": searchPayload } }
])
print(list(recents))
Note: The
$regex
regular expression search will cause performance issues.
To improve search performance you can create a compound index on your search fields like,
information_collection.createIndex({ field1: 1, field2: 1 });
Upvotes: 2
Reputation: 3349
What you are looking for can be achieved using MongoDB Aggregation
Note: As pointed out by @turivishal, $text
won't work if it is not in the first stage of the aggregation pipeline.
collection.aggregate([
{
"$sort": {
"_id": -1
}
},
{
"$limit": 10 // `n` value, where n is the number of last records you want to consider
},
{
"$match" : {
// All your find query goes here
}
},
], {allowDiskUse=true}) // just in case if the computation exceeds 100MB
Since _id
is indexed by default, the above aggregation query should be faster. But, its performance reduces in proportion to the n
value.
Note: Replace the last line in the code example with the below line if you are using pymongo
], allowDiskUse=True)
Upvotes: 2