MongoDB querying whitespace with regex

Question

I've got a large collection of text data stored in MondoDB that users can query via keyword or phrase, and have an issue where some data has unicode character U+00A0 (no-break space) instead of a regular space.

Fixing up the data not being an option (those nbsps are there intentionally), I still want the user to be able to search and find that data. So I updated our Mongo query-building code to search for any whitespace [\s] in places where the user entered a space, resulting in a query like so:

{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[\s]performance" , "$options" : "i"} }}}

(there's more to the query, that's just the relevant bit).

Unfortunately, this doesn't return the expected results. So I play around with a bunch of other ways to accomplish this, and eventually discover that I get the correct results when I search for "not non-whitespace" [^\S], as so:

{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[^\S]performance" , "$options" : "i"} }}}

Which leads to my question -- why does "any whitespace" ("\s") fail finding this text while "not-non whitespace" ("^\S") finds it successfully? Does Mongo have a different set of rules for what counts as whitespace and non-whitespace?

Data is all in UTF-8 throughout, MongoDB version is 2.2.2

Igor Chubin · Accepted Answer

I suppose that the problem here is with \, not with spaces. Can you please write \ to prove my conjecture?

MongoDB querying whitespace with regex

Answers (1)

Related Questions