AvyWam
AvyWam

Reputation: 970

How to correctly design a regular expression in pymongo?

I use python 3.7.1 (default, Dec 14 2018, 19:28:38), and pymongo 3.7.2.

In mongodb this works:

db.collection.find(
    {$and:[
    {"field":{$regex:"bon?"}},
    {"field":{$not:{$regex:"bon souple"}}},
    {"field":{$not:{$regex:"bon léger"}}}
    ]}
    )

So in pymongo I did the same as:

db.collection.find(
    {"$and":[
    {"field":{"$regex":"bon?"}},
    {"field":{"$not":{"$regex":"bon souple"}}},
    {"field":{"$not":{"$regex":"bon léger"}}}
    ]}
    )

but it indicatespymongo.errors.OperationFailure: $regex has to be a string.

So I tried this as proposed here:

liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}}, 
{'field': {'$regex': {'$not': re.compile('bon léger')}}}, 
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

I noticed that even when there is no special character it indicates the same error:

liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}} #where no special char is present
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

So I tried to use "/" as:

liste_reg=[
{'field': {'$regex': {'$not':'/bon souple/'}}} #where no special char is present
#even tried re.compile('/bon souple/')
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

the same error pymongo.errors.OperationFailure: $regex has to be a string still occurs.

What can I do?

SOME UPDATE OF MY RESEARCH OF SOLUTION

the core of the issue seems to be with $not because when I do:

liste_reg=[{'field': {'$regex': 'bon?'}}]
rslt=list(
    db.collection.find({"$and":liste_reg})
)
len(rslt)#gives 23 013, what is ok.

There is no error.

SOME SAMPLES

As asked by Emma I can give a sample, and it will explicit my request in mongo. Normally I must have these modalities in the field:

The main problem for me is my spider did not parse correctly because I did not write a strong enough script for that. Instead of obtaining just "bon", I obtain this kind of result:

{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon",
...}

and that's an example between many others wrong parsing. So that's why I want result that begins with "bon?" but not "bon souple" or "bon léger" because they have correct values, no \n or \t.

So as samples:

[{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon"},
{"_id":"ID2",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\tpremière"},
{"_id":"ID3",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t2ème"},
{"_id":"ID4",
"field":"bon souple"},
{"_id":"ID5",
"field":"bon léger"}]

Upvotes: 2

Views: 1658

Answers (3)

jetilling
jetilling

Reputation: 56

I just ran into this same issue.

Try doing this:

liste_reg=[
{'field': {'$not': re.compile('bon souple')}}, 
{'field': {'$not': re.compile('bon léger')}}, 
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

I just removed the $regex part of the query.

Background

I tried doing {item["type"]: {"$not": item['name']}} and pymongo returned a $not needs a regex or a document error.

So, I tried: {item["type"]: {"$not": {"$regex": item['name']}}} and pymongo returned a $not cannot have a regex error.

I found this SO https://stackoverflow.com/a/20175230/9069964 and here's what finally worked for me:

item_name = item["name"]
{item["type"]: {"$not": re.compile(item_name)}}

I had to ditch the "$regex" part and give "$not" my regex stuff.

Upvotes: 4

chuck_sum
chuck_sum

Reputation: 111

Try using a string literal with a negative look ahead. The example below should work as long as you have a carriage return (\r) after 'bon'.

import re
bon = re.compile(r'bon(?=\r)')
db.collection.find({'field': bon})

Upvotes: 1

Emma
Emma

Reputation: 27723

Here, we might be able to approach solving this problem, maybe without using the $not feature. For instance, if we wish to not have bon souple or bon léger which are bon followed by an space, we could maybe use an expression similar to:

"bon[^\s].+"

DEMO

I'm not so sure about what we wish to extract here, but I was just guessing that maybe we would want to swipe bon values not followed by an space and in between the ".

Also, we would likely want to look into regex query requirements and adjust our expressions to it, if necessary, such as with escaping or using capturing group:

(bon[^\s].+)

or:

"(bon[^\s].+)"

or:

\"(bon[^\s].+)\" 

or:

([\s\S]*?)\"(bon[^\s].+)\"

DEMO

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here


I'm not quite sure if this would be what we might want or if it would be relevant, yet according to this documentation, we can try using:

{ name: { $regex: /([\s\S]*?)\"(bon[^\s].+)\"/, $options: "mi" } }

or:

{ name: { $regex: '([\s\S]*?)\"(bon[^\s].+)\"', $options: "mi" } }

db.collection.find

db.collection.find({"field":{ $regex: /(bon[^\s].+)/, $options: "mi" }})

or:

db.collection.find({"field":{ $regex: /(bon[^\s].+)/, $options: "si" }})

Reference:

PyMongo $in + $regex

Performing regex Queries with pymongo

Upvotes: 0

Related Questions