Reputation: 246
I have a TreeSet<String>
containing some keywords. I need to test some String
s to see if they contain any of these keywords.
I currently have:
String tweet = object.getText();
for (String keyword : keywords_set)
{
if(tweet.contains(keyword))
{
return true;
}
}
Is there a more elegant and efficient way of doing this for a stream of Strings?
Upvotes: 1
Views: 1577
Reputation: 2091
Here's an explicit example of how to use AhoCorasick.
I created a branch of Robert Bor's java Aho-Corasick implementation that adds a "matches" method which returns true as soon as the first match is found.
Building the search trie is non-trivial. I've included an in-efficient method implementation that matches the code sample you provided. But you really want to amortize the cost of the trie construction across a large number of searches. To do that you really want to change your code that calls the example you included.
I included an example of how to construct the search trie once and them use it for multiple searches.
public boolean doesTweetMatchSlow(String tweet, Set<String> keywords_set)
{
Trie searchTrie = new Trie();
for (String keyword : keywords_set) {
searchTrie.addKeyword(keyword);
}
return searchTrie.matches(tweet);
}
public Collection<String> findMatchingTweetsFaster(Iterable<String> tweets, Set<String> keywords_set)
{
List<String> matching = null;
if (tweets != null) {
matching = new ArrayList<>();
if (keywords_set != null && !keywords_set.isEmpty()) {
// build trie once.
Trie searchTrie = new Trie();
for (String keyword : keywords_set) {
searchTrie.addKeyword(keyword);
}
for (String tweet : tweets) {
// Re-use trie for every tweet.
if (searchTrie.matches(tweet)) {
matching.add(tweet);
}
}
}
}
return matching;
}
Upvotes: 0
Reputation: 280102
You won't get any more efficient than what you have with JDK classes and methods. You need to go through each String
in the Set
and check if your String
contains it.
However, you can make it cleaner if you are willing to use a 3rd party library, Guava.
With Guava, you can use Iterables.any(Iterable, Predicate)
which
Returns true if any element in iterable satisfies the predicate.
Use it like so
Set<String> keywords_set = ...
final String tweet = ...
return Iterables.any(keywords_set, new Predicate<String>() {
@Override
public boolean apply(String input) {
return tweet.contains(input);
}
});
With Java 8, it will be even cleaner thanks to lambda expressions and aggregate operations.
Upvotes: 2
Reputation: 1919
Aho-Corasick multiple string matching algorithm: https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
Upvotes: 1