Reputation: 970
I use python 3.7.1 (default, Dec 14 2018, 19:28:38), and pymongo 3.7.2.
In mongodb this works:
db.collection.find(
{$and:[
{"field":{$regex:"bon?"}},
{"field":{$not:{$regex:"bon souple"}}},
{"field":{$not:{$regex:"bon léger"}}}
]}
)
So in pymongo I did the same as:
db.collection.find(
{"$and":[
{"field":{"$regex":"bon?"}},
{"field":{"$not":{"$regex":"bon souple"}}},
{"field":{"$not":{"$regex":"bon léger"}}}
]}
)
but it indicatespymongo.errors.OperationFailure: $regex has to be a string
.
So I tried this as proposed here:
liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}},
{'field': {'$regex': {'$not': re.compile('bon léger')}}},
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
I noticed that even when there is no special character it indicates the same error:
liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}} #where no special char is present
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
So I tried to use "/"
as:
liste_reg=[
{'field': {'$regex': {'$not':'/bon souple/'}}} #where no special char is present
#even tried re.compile('/bon souple/')
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
the same error pymongo.errors.OperationFailure: $regex has to be a string
still occurs.
What can I do?
SOME UPDATE OF MY RESEARCH OF SOLUTION
the core of the issue seems to be with $not
because when I do:
liste_reg=[{'field': {'$regex': 'bon?'}}]
rslt=list(
db.collection.find({"$and":liste_reg})
)
len(rslt)#gives 23 013, what is ok.
There is no error.
SOME SAMPLES
As asked by Emma I can give a sample, and it will explicit my request in mongo. Normally I must have these modalities in the field:
The main problem for me is my spider did not parse correctly because I did not write a strong enough script for that. Instead of obtaining just "bon", I obtain this kind of result:
{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon",
...}
and that's an example between many others wrong parsing.
So that's why I want result that begins with "bon?"
but not "bon souple"
or "bon léger"
because they have correct values, no \n
or \t
.
So as samples:
[{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon"},
{"_id":"ID2",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\tpremière"},
{"_id":"ID3",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t2ème"},
{"_id":"ID4",
"field":"bon souple"},
{"_id":"ID5",
"field":"bon léger"}]
Upvotes: 2
Views: 1658
Reputation: 56
I just ran into this same issue.
Try doing this:
liste_reg=[
{'field': {'$not': re.compile('bon souple')}},
{'field': {'$not': re.compile('bon léger')}},
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
I just removed the $regex
part of the query.
Background
I tried doing {item["type"]: {"$not": item['name']}}
and pymongo returned a $not needs a regex or a document
error.
So, I tried: {item["type"]: {"$not": {"$regex": item['name']}}}
and pymongo returned a $not cannot have a regex
error.
I found this SO https://stackoverflow.com/a/20175230/9069964 and here's what finally worked for me:
item_name = item["name"]
{item["type"]: {"$not": re.compile(item_name)}}
I had to ditch the "$regex" part and give "$not" my regex stuff.
Upvotes: 4
Reputation: 111
Try using a string literal with a negative look ahead. The example below should work as long as you have a carriage return (\r) after 'bon'.
import re
bon = re.compile(r'bon(?=\r)')
db.collection.find({'field': bon})
Upvotes: 1
Reputation: 27723
Here, we might be able to approach solving this problem, maybe without using the $not
feature. For instance, if we wish to not have bon souple
or bon léger
which are bon
followed by an space, we could maybe use an expression similar to:
"bon[^\s].+"
I'm not so sure about what we wish to extract here, but I was just guessing that maybe we would want to swipe bon
values not followed by an space and in between the "
.
Also, we would likely want to look into regex query requirements and adjust our expressions to it, if necessary, such as with escaping or using capturing group:
(bon[^\s].+)
or:
"(bon[^\s].+)"
or:
\"(bon[^\s].+)\"
or:
([\s\S]*?)\"(bon[^\s].+)\"
jex.im visualizes regular expressions:
I'm not quite sure if this would be what we might want or if it would be relevant, yet according to this documentation, we can try using:
{ name: { $regex: /([\s\S]*?)\"(bon[^\s].+)\"/, $options: "mi" } }
or:
{ name: { $regex: '([\s\S]*?)\"(bon[^\s].+)\"', $options: "mi" } }
db.collection.find({"field":{ $regex: /(bon[^\s].+)/, $options: "mi" }})
or:
db.collection.find({"field":{ $regex: /(bon[^\s].+)/, $options: "si" }})
Reference:
Upvotes: 0