Ray
Ray

Reputation: 21905

Process boolean phrase with Regex

I am processing user input on a search page. If the user selects an 'All Words' type search, then I remove any boolean search operators from the search text and stick ' AND ' between each real word. Pretty simple in most cases. However, I can't figure out how to remove two boolean operators in a row.

Here is my code:

// create the regex
private static Regex _cleaner =
     new Regex("(\\s+(and|or|not|near)\\s+)|\"", 
          RegexOptions.Compiled | RegexOptions.IgnoreCase);

// call the regex
_cleaner.Replace(searchText, " ")

The problem occurs when a user enters a search string like coffee and not tea. The regex will remove the 'and', but not the 'not'. The resulting string is 'coffeenot tea' - what I want is 'coffee tea'.

The white space is required in the regex so I don't remove 'and', 'or', etc when embedded in real words (like 'band' or 'corps').

I have temporarily resolved this by calling the clean method twice, which will remove two operators in a row (which is probably all I would ever need). But it is not very elegant, is it? I would really like to do it right. I feel like I am missing something simple...

Upvotes: 1

Views: 167

Answers (4)

agent-j
agent-j

Reputation: 27943

Your regex is not matching because you require whitespace on each side of your term, but since it's not _and__not_, you only match _and_.

Consider this regex:

@"(?:and|or|not|near)\s+|"""

Upvotes: 0

MRAB
MRAB

Reputation: 20664

Try adding word boundaries:

"\\b(and|or|not|near)\\b|\""

Upvotes: 3

zellio
zellio

Reputation: 32524

Wouldn't just adding a + fix the problem?

private static Regex _cleaner = 
    new Regex("(\\s+(and|or|not|near)\\s+)+|\"", 
              RegexOptions.Compiled | RegexOptions.IgnoreCase);

// call the regex
_cleaner.Replace(searchText, " ")

Upvotes: 0

Nahydrin
Nahydrin

Reputation: 13517

Change your regex to the following:

private static Regex _cleaner = new Regex("(\\s+(and|or|not|near)\\s+)*|\"", RegexOptions.Compiled | RegexOptions.IgnoreCase);

Upvotes: 1

Related Questions