yulai
yulai

Reputation: 761

Check if string contains word (not substring!)

Is there a way to check if a substring contains an entire WORD, and not a substring.

Envision the following scenario:

public class Test {
    public static void main(String[] args) {
        String[] text = {"this is a", "banana"};
        String search = "a";

        int counter = 0;
        for(int i = 0; i < text.length; i++) {
            if(text[i].toLowerCase().contains(search)) {
                counter++;
            }
        }

        System.out.println("Counter was " + counter);
    }
}

This evaluates to

Counter was 2

Which is not what I'm looking for, as there is only one instance of the word 'a' in the array.

The way I read it is as follows:

The if-test finds an 'a' in text[0], the 'a' corresponding to "this is [a]". However, it also finds occurrences of 'a' in "banana", and thus increments the counter.

How can I solve this to only include the WORD 'a', and not substrings containing a?

Thanks!

Upvotes: 2

Views: 6384

Answers (5)

Peter Lawrey
Peter Lawrey

Reputation: 533880

You could use a regex, using Pattern.quote to escape out any special characters.

String regex = ".*\\b" + Pattern.quote(search) + "\\b.*"; // \b is a word boundary

int counter = 0;
for(int i = 0; i < text.length; i++) {
    if(text[i].toLowerCase().matches(regex)) {
        counter++;
    }
}

Note this will also find "a" in "this is a; pause" or "Looking for an a?" where a doesn't have a space after it.

Upvotes: 6

GhostCat
GhostCat

Reputation: 140633

Of course, as others have written, you can start playing around with all kinds of pattern to match "words" out of "text".

But the thing is: depending on the underlying problem you have to solve, this might (by far) not good enough. Meaning: are you facing the problem of finding some pattern in some string ... or is it really, that you want to interpret that text in the "human language" sense? You know, when somebody writes down text, there might be subtle typos, strange characters; all kind of stuff that make it hard to really "find" a certain word in that text. Unless you dive into the "language processing" aspect of things.

Long story short: if your job is "locate certain patterns in strings"; then all the other answers will do. But if your requirement goes beyond that, like "some human will be using your application to 'search' huge data sets"; then you better stop now; and consider turning to full-text enabled search engines like ElasticSearch or Solr.

Upvotes: 0

Michele Da Ros
Michele Da Ros

Reputation: 906

Arrays.asList("this is a banana".split(" ")).stream().filter((s) -> s.equals("a")).count();

Upvotes: 0

rev_dihazum
rev_dihazum

Reputation: 818

Could try this way:

for(int i = 0; i < text.length; i++) {
    String[] words = text[i].split("\\s+");
    for (String word : words) 
        if(word.equalsIgnoreCase(search)) {
            counter++;
            break;
        }
}

Upvotes: 1

dryairship
dryairship

Reputation: 6077

If the words are separated by a space, then you can do:

if((" "+text[i].toLowerCase()+" ").contains(" "+search+" "))
{
   ...
}

This adds two spaces to the original String.
eg: "this is a" becomes " this is a ".

Then it searches for the word, with the flanking spaces. eg: It searches for " a " when search is "a"

Upvotes: 0

Related Questions