Morris Fauntleroy
Morris Fauntleroy

Reputation: 246

Elegant way to check if a String contains a keyword from a Set<String>

I have a TreeSet<String> containing some keywords. I need to test some Strings to see if they contain any of these keywords.

I currently have:

String tweet = object.getText();
for (String keyword : keywords_set)
{
    if(tweet.contains(keyword))
    {
        return true;
    }
}

Is there a more elegant and efficient way of doing this for a stream of Strings?

Upvotes: 1

Views: 1577

Answers (3)

Ryan
Ryan

Reputation: 2091

Here's an explicit example of how to use AhoCorasick.

I created a branch of Robert Bor's java Aho-Corasick implementation that adds a "matches" method which returns true as soon as the first match is found.

Building the search trie is non-trivial. I've included an in-efficient method implementation that matches the code sample you provided. But you really want to amortize the cost of the trie construction across a large number of searches. To do that you really want to change your code that calls the example you included.

I included an example of how to construct the search trie once and them use it for multiple searches.

public boolean doesTweetMatchSlow(String tweet, Set<String> keywords_set)
{
        Trie searchTrie = new Trie();
        for (String keyword : keywords_set) {
            searchTrie.addKeyword(keyword);
        }

        return searchTrie.matches(tweet);
}

public Collection<String> findMatchingTweetsFaster(Iterable<String> tweets, Set<String> keywords_set)
{
    List<String> matching = null;

    if (tweets != null) {
        matching = new ArrayList<>();

        if (keywords_set != null && !keywords_set.isEmpty()) {

            // build trie once.
            Trie searchTrie = new Trie();
            for (String keyword : keywords_set) {
                searchTrie.addKeyword(keyword);
            }

            for (String tweet : tweets) {
                // Re-use trie for every tweet.
                if (searchTrie.matches(tweet)) {
                    matching.add(tweet);
                }
            }
        }
    }
    return matching;
}

Upvotes: 0

Sotirios Delimanolis
Sotirios Delimanolis

Reputation: 280102

You won't get any more efficient than what you have with JDK classes and methods. You need to go through each String in the Set and check if your String contains it.

However, you can make it cleaner if you are willing to use a 3rd party library, Guava.

With Guava, you can use Iterables.any(Iterable, Predicate) which

Returns true if any element in iterable satisfies the predicate.

Use it like so

Set<String> keywords_set = ...
final String tweet = ...

return Iterables.any(keywords_set, new Predicate<String>() {
    @Override
    public boolean apply(String input) {
        return tweet.contains(input);
    }           
});

With Java 8, it will be even cleaner thanks to lambda expressions and aggregate operations.

Upvotes: 2

user3392484
user3392484

Reputation: 1919

Aho-Corasick multiple string matching algorithm: https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm

Upvotes: 1

Related Questions