user2063949
user2063949

Reputation: 61

Java & Regex: Matching a substring that is not preceded by specific characters

This is one of those questions that has been asked and answered hundreds of times over, but I'm having a hard time adapting other solutions to my needs.

In my Java-application I have a method for censoring bad words in chat messages. It works for most of my words, but there is one particular (and popular) curse word that I can't seem to get rid of. The word is "faen" (which is simply a modern slang for "satan", in the language in question).

Using the pattern "fa+e+n" for matching multiple A's and E's actually works; however, in this language, the word for "that couch" or "that sofa" is "sofaen". I've tried a lot of different approaches, using variations of [^so] and (?!=so), but so far I haven't been able to find a way to match one and not the other.

The real goal here, is to be able to match the bad words, regardless of the number of vowels, and regardless of any non-letters in between the components of the word.

Here's a few examples of what I'm trying to do:

"String containing faen"                        Should match
"String containing sofaen"                      Should not match
"Non-letter-censored string with [email protected]"     Should match
"Non-letter-censored string with [email protected]"   Should not match

Any tips to set me off in the right direction on this?

Upvotes: 4

Views: 414

Answers (2)

Fabian Schmengler
Fabian Schmengler

Reputation: 24576

It's a terrible idea to begin with. You think, your users would write something like "f-aeen" to avoid your filter but would not come up with "ffaen" or "-faen" or whatever variation that you did not prepare for? This is a race you cannot win and the real loser is usability.

Upvotes: 1

user1596371
user1596371

Reputation:

You want something like \bf[^\s]+a[^\s]+e[^\s]+n[^\s]\b. Note that this is the regular expression; if you want the Java then you need to use \\b[^\\s]+f[^\\s]+a[^\\s]+e[^\\s]+n[^\\s]\b.

Note also that this isn't perfect, but does handle the situations that you have suggested.

Upvotes: 2

Related Questions