Reputation: 4162
suppose my DB gives me a query result as:
{'_id': ObjectId('5c99d76a32aacf180485c3b3'),
'text': 'ILLUSTRATION : 1\nFind the quotient and remainder q and r for the pairs of positive integers given below:\n(i) 23,4\n(ii) 81,3\n(iii) 12,5\nUTION.\n',
'text2': '',
'parent': None,
'repost': 3,
'time': datetime.datetime(2010, 5, 9, 16, 5, 27, 838000)}
I want to get the first 1000 documents where either the length of text
or length of text2
is >=5:
I can do this via Python but it'll be foolish thing to do:
objects = []
i = 0
for obj in db.essays.find():
if len(obj['text']>=5) or len(obj['text2']>=5):
objects.append(obj)
i+=1
if i==1000:
break
I know it is so foolish.
I can use limit(1000)
if I have the exact matches but I do not how to get documents based on value length.
EDIT: Somehow I managed to do a PATCH as:
{ "$or":[{"$expr": { "$gt": [ { "$strLenCP": "$text" }, 5 ]}},
{"$expr": { "$gt": [ { "$strLenCP": "$text2" }, 5 ]}},
{"$expr": { "$gt": [ { "$strLenCP": "$text3" }, 5 ]}},
]}
But when I use the AND
operation to get documents when all of the texts are less than 3 in length, it throws an error:
{ "$and":[{"$expr": { "$lt": [ { "$strLenCP": "$text" }, 5 ]}},
{"$expr": { "$lt": [ { "$strLenCP": "$text2" }, 5 ]}},
{"$expr": { "$lt": [ { "$strLenCP": "$text3" }, 5 ]}},
]}
it works with limit(2)
but fails with >2 and throws an error as:
`OperationFailure: $strLenCP requires a string argument, found: null`
Upvotes: 2
Views: 338
Reputation: 22316
You can use a pipeline with strLenCP
db.collection.aggregate([
{
"$match": {
"$expr": {
"$or": [
{
"$gte": [
{
"$strLenCP": {
"$ifNull": [
"$text",
""
]
}
},
5
]
},
{
"$gte": [
{
"$strLenCP": {
"$ifNull": [
"$text2",
""
]
}
},
5
]
}
]
}
}
},
{
"$limit": 1000
}
])
However if you really care above performance the best way would be to preprocess that information:
{
'_id': ObjectId('5c99d76a32aacf180485c3b3'),
'text': 'ILLUSTRATION : 1\nFind the quotient and remainder q and r for the pairs of positive integers given below:\n(i) 23,4\n(ii) 81,3\n(iii) 12,5\nUTION.\n',
'text2': '',
'parent': None,
'repost': 3,
'time': datetime.datetime(2010, 5, 9, 16, 5, 27, 838000),
'text_len': 100,
"text2_len": 0
}
So now a simple query would suffice:
db.essays.find({"$or": [{"text_len": {"$gte": 5}}, {"text2_len": {"$gte": 5}}]}).limit(1000)
Upvotes: 5
Reputation: 3349
You can make use of the $strLenCP
Aggregation Pipeline.
Here is a sample code illustration:
objects = []
for obj in db.essays.aggregate([
{
"$project": {
"text1Len": {
"$strLenCP": "$text"
},
"text2Len": {
"$strLenCP": "$text2"
},
"docRoot": "$$ROOT",
}
},
{
"$match": {
"$or": [
{"text1Len": {"$gte": 5}},
{"text2Len": {"$gte": 5}},
]
}
},
{
"$limit": 1000
},
{
"$replaceRoot": {
"newRoot": "$docRoot"
}
},
]):
objects.append(obj)
Upvotes: 2