RegEx to exclude match if a certain word is present, but not another partial word

Question

I have the keyword "cum" which our firewall uses to block adult sites, problem is this works a little too well because this also blocks any URL with the word "document"

The firewall will take regex strings, and I tried this:

^.*(?!document)cum.*$

Vut it still matches "document". I have a feeling I should be using a pipe | but I don't get it.

What I want is to match anywhere

*cum*

is found in the URL (or domain-name), but NOT if the word is document or documents.

Possible? As I understand it, a word boundary doesn't work here because the word cum won't necessarily be separated by white-space when it's in a URL, and definitely not if it's in a domain-name.

Here's another way to put it:

Allow "examplesearchdocuments.com"
Allow "examplemydocuments.com"
Allow "documentexample.com"
Allow "example.com/somedocuments"
Don't allow "funnycumsiteexample.com"
Don't allow "cumallovereverythingexample.com"
Don't allow "exampleseemycum.com"

where cum being the bad word match. Sorry if any of these examples are real sites, I don't know how else to convey this.

deltree · Accepted Answer

Per the comments, I was wrong.

If you use a lookbehind inside your lookahead, you can match "cum" only if it is not within the word "document".

cum(?!(?<=docum)ent)

Here is some reading on lookaround http://www.regular-expressions.info/lookaround.html

Here it is against a large number of tests.

http://www.rubular.com/r/b5iZrn6Cjz

RegEx to exclude match if a certain word is present, but not another partial word

Answers (2)

Related Questions