Reputation: 8482
\s
regex wildcard doesn't match all types of space in mongodb (v4.0.3)
> db.test.insertOne({ "mail" : "special [email protected]" })
> db.test.insertOne({ "mail" : "normal [email protected]" })
> db.test.find({ mail: / / }, { _id: 0, mail: 1 })
{ "mail" : "special [email protected]" }
> db.test.find({ mail: /\s/ }, { _id: 0, mail: 1 })
{ "mail" : "normal [email protected]" }
The space
in special [email protected]
above is special space, and normal space in normal [email protected]
Is this expected, or a bug? Is there any way to make it match all spaces?
Sidenote: I am running regex inside $not
so I can't use $regex
Edit: Even [^\S]
doesn't match both strings
> db.test.find({ mail: /[^\S]/ }, { _id: 0, mail: 1 })
{ "mail" : "normal [email protected]" }
Does mongo regex only work with ASCII?
Upvotes: 2
Views: 1751
Reputation: 37038
Mongo uses PCRE flavour https://docs.mongodb.com/manual/reference/operator/query/regex/#op._S_regex
https://www.pcre.org/original/doc/html/pcrepattern.html reads:
The default \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32), which are defined as white space in the "C" locale. This list may vary if locale-specific matching is taking place. For example, in some locales the "non-breaking space" character (\xA0) is recognized as white space, and in others the VT character is not.
You can replace \s
with
[\s\x00a0\x1680\x2000\x2001\x2002\x2003\x2004\x2005\x2006
\x2007\x2008\x2009\x200a\x2028\x2029\x202f\x205f\x3000\xfeff]
(split for readability) for compatibility with ECMA regex flavour.
You may need to wrap codes into {}
depending on shell/client e.g. \x{00a0}\x{1680}
and so on.
For your query it would be:
db.test.find({ mail: /[\s\x{00a0}\x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005}\x{2006}\x{2007}\x{2008}\x{2009}\x{200a}\x{2028}\x{2029}\x{202f}\x{205f}\x{3000}\x{feff}]/ }, { _id: 0, mail: 1 })
Upvotes: 4