Phill
Phill

Reputation: 125

Is there a way to find the EXACT string of a word in a discord message?

Currently I am working on a discord bot, which is filtering messages. My problem occurs when trying to filter words, which are included in others, thus triggering duplicate messages.

This is my filter.txt:

sad
sadness
sadnesses

Since "sad" can be found in "sadness" as well, I get a false-positive for "sad" whenever "sadness" is written.
Is it possible to only detect the exact string in a message? Like: I want to be happy, because sadness is bad → 'Just detect sadness'

I hope you understand what i mean.

Code:

public void onGuildMessageReceived(GuildMessageReceivedEvent e) {
    File file = new File("src/filter.txt");
    try {
        BufferedReader br = new BufferedReader(new FileReader(file));
        String line;
        while ((line = br.readLine()) != null) {
            if(!line.startsWith("#")) {
                if(e.getMessage().getContentRaw().contains(line)) {
                    User user = e.getJDA().getUserById(e.getAuthor().getIdLong());
                    e.getMessage().delete().queue();
                    user.openPrivateChannel().queue(privateChannel -> {
                        privateChannel.sendMessage("Bitte achte auf deine Sprache!").queue();
                    });
                }                   
            }
        }
    } catch (IOException e1) {}
}

Upvotes: 5

Views: 9281

Answers (2)

Danny
Danny

Reputation: 842

As Cardinal - Reinstate Monica and Hades already said, you should take a look at regex.

'Regex' stands for 'Regular expression' and describes search patterns for strings.

There is a lot you can do using regex, so if you want to know more about it, check out a tutorial.
(It's the first I found when googling, you can use any tutorial of your liking of course.)

For your use case I would suggest the following:

First off, don't use String.contains(), as it only works with Strings, not with regex.
Use String.matches() instead with the following regex:

"(?is).*\\bSTRING\\b.*"

Because there is some escaping done, this is what the regex would look like without it:

(?is).*\bSTRING\b.*

I will explain how it works.

\b

\b matches a word boundary. Word characters are a - z, A - Z, 0 - 9 and _. Any combination of this characters is considered a word.
This has the advantage, that you can match the word sad in the following cases:

  • "I am sad." → The . at the end of the sentence doesn't influence the detection.
  • "sad is my thing" → The word is matched even when it's the first one. (This is also influenced by .*.)

When using sadness, it won't match sad, as the word continues afterwards:

  • "I am feeling the sadness!" → Because the word doesn't end after "sad", it's not a match. Matching "sadness" would work.

.*

. matches any character except some line breaks. ((?s) helps me out here.)
* basically says, that the part in front of it occurs zero or more times.
By using a .* before and after the string, the regex is fine with any character or combination of characters (including no characters) surrounding the string.
That's important, because in this way the words can be placed in every imaginable sentence and will always match not matter what.

(?is)

?i and ?s enable certain modes.
?i makes the regex case insensitive. This means, it doesn't matter if is's sadness, SADNESS or sAdNeSs; all three will match.
?s enables the 'single line mode', which just means, that . is matching all line breaks as well.
?i and ?s can be combined to (?is) and then placed in front of the regex.

Instead of STRING you just have to insert your words like this:

"(?is).*\\b" + line + "\\b.*"

Your code would look like this in the end:

public void onGuildMessageReceived(GuildMessageReceivedEvent e) {
    File file = new File("src/filter.txt");
    try {
        BufferedReader br = new BufferedReader(new FileReader(file));
        String line;
        while ((line = br.readLine()) != null) {
            if(!line.startsWith("#")) {
                if(e.getMessage().getContentRaw().matches("(?is).*\\b" + line + "\\b.*")) {
                    User user = e.getJDA().getUserById(e.getAuthor().getIdLong());
                    e.getMessage().delete().queue();
                    user.openPrivateChannel().queue(privateChannel -> {
                        privateChannel.sendMessage("Bitte achte auf deine Sprache!").queue();
                    });
                }  
            }
        }
    } catch (IOException e1) {}
}

If you want it to only generate one message per message (thus stopping after the first match) you could just insert a return; after matching a word and after sending the message to the user.

Upvotes: 2

Minn
Minn

Reputation: 6134

You could also try using a string searching algorithm such as Aho-Corasick, but that would require implementing a proper signature table. An algorithm like this would be a lot better at a bigger list of words.

Note that such algorithms are easily circumvented. Simply adding whitespace or using 1337 character replacement would outsmart a naive word filter.

Upvotes: 0

Related Questions