Reputation: 33
I've got a large collection of text data stored in MondoDB that users can query via keyword or phrase, and have an issue where some data has unicode character U+00A0 (no-break space) instead of a regular space.
Fixing up the data not being an option (those nbsps are there intentionally), I still want the user to be able to search and find that data. So I updated our Mongo query-building code to search for any whitespace [\s] in places where the user entered a space, resulting in a query like so:
{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[\s]performance" , "$options" : "i"} }}}
(there's more to the query, that's just the relevant bit).
Unfortunately, this doesn't return the expected results. So I play around with a bunch of other ways to accomplish this, and eventually discover that I get the correct results when I search for "not non-whitespace" [^\S], as so:
{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[^\S]performance" , "$options" : "i"} }}}
Which leads to my question -- why does "any whitespace" ("\s") fail finding this text while "not-non whitespace" ("^\S") finds it successfully? Does Mongo have a different set of rules for what counts as whitespace and non-whitespace?
Data is all in UTF-8 throughout, MongoDB version is 2.2.2
Upvotes: 3
Views: 10582
Reputation: 64563
I suppose that the problem here is with \
, not with spaces. Can you please write \\
to prove my conjecture?
Upvotes: 7