Querying the collection and documents based on the value length in Pymongo

Question

suppose my DB gives me a query result as:

{'_id': ObjectId('5c99d76a32aacf180485c3b3'),
 'text': 'ILLUSTRATION : 1
Find the quotient and remainder q and r for the pairs of positive integers given below:
(i) 23,4
(ii) 81,3
(iii) 12,5
UTION.
',
 'text2': '',
 'parent': None,
 'repost': 3,
 'time': datetime.datetime(2010, 5, 9, 16, 5, 27, 838000)}

I want to get the first 1000 documents where either the length of text or length of text2 is >=5:

I can do this via Python but it'll be foolish thing to do:

objects = []
i = 0
for obj in db.essays.find():
    if len(obj['text']>=5) or len(obj['text2']>=5):
        objects.append(obj)
        i+=1
    if i==1000:
        break

I know it is so foolish.

I can use limit(1000) if I have the exact matches but I do not how to get documents based on value length.

EDIT: Somehow I managed to do a PATCH as:

{ "$or":[{"$expr": { "$gt": [ { "$strLenCP": "$text" }, 5 ]}},
                                     {"$expr": { "$gt": [ { "$strLenCP": "$text2" }, 5 ]}},
                                    {"$expr": { "$gt": [ { "$strLenCP": "$text3" }, 5 ]}},
                                     ]}

But when I use the AND operation to get documents when all of the texts are less than 3 in length, it throws an error:

{ "$and":[{"$expr": { "$lt": [ { "$strLenCP": "$text" }, 5 ]}},
                                     {"$expr": { "$lt": [ { "$strLenCP": "$text2" }, 5 ]}},
                                    {"$expr": { "$lt": [ { "$strLenCP": "$text3" }, 5 ]}},
                                     ]}

it works with limit(2) but fails with >2 and throws an error as:

`OperationFailure: $strLenCP requires a string argument, found: null`

Tom Slabbaert · Accepted Answer

You can use a pipeline with strLenCP

db.collection.aggregate([
  {
    "$match": {
      "$expr": {
        "$or": [
          {
            "$gte": [
              {
                "$strLenCP": {
                  "$ifNull": [
                    "$text",
                    ""
                  ]
                }
              },
              5
            ]
          },
          {
            "$gte": [
              {
                "$strLenCP": {
                  "$ifNull": [
                    "$text2",
                    ""
                  ]
                }
              },
              5
            ]
          }
        ]
      }
    }
  },
  {
    "$limit": 1000
  }
])

However if you really care above performance the best way would be to preprocess that information:

{
'_id': ObjectId('5c99d76a32aacf180485c3b3'),
 'text': 'ILLUSTRATION : 1
Find the quotient and remainder q and r for the pairs of positive integers given below:
(i) 23,4
(ii) 81,3
(iii) 12,5
UTION.
',
 'text2': '',
 'parent': None,
 'repost': 3,
 'time': datetime.datetime(2010, 5, 9, 16, 5, 27, 838000),
  'text_len': 100,
  "text2_len": 0
}

So now a simple query would suffice:

db.essays.find({"$or": [{"text_len": {"$gte": 5}}, {"text2_len": {"$gte": 5}}]}).limit(1000)

Mongo Playground

Querying the collection and documents based on the value length in Pymongo

Answers (2)

Related Questions