nicost
nicost

Reputation: 1040

find two consecutive words/strings with regex expression java (including punctuation)

I want to check wheter a string is containing two words/string directly followed in a specific order. The punctuation should also be included in the word/string. (i.e. "word" and "word." should be handeled as different words).

As an example:

    String word1 = "is";
    String word1 = "a";
    String text = "This is a sample";

    Pattern p = Pattern.compile(someregex+"+word1+"someregex"+word2+"someregex");
    System.out.println(p.matcher(text).matches());

This should print out true.

With the following variables, it should also print true.

    String word1 = "sample.";
    String word1 = "0END";
    String text = "This is a sample. 0END0";

But the latter should return false when setting word1 = "sample" (without punctuation).

Does anyone have an idea how the regex string should look like (i.e. what i should write instead of "someregex" ?)

Thank you!

Upvotes: 0

Views: 1551

Answers (2)

Jamie Cockburn
Jamie Cockburn

Reputation: 7555

Looks like you're just splitting on whitespace, try:

Pattern p = Pattern.compile("(\\s|^)" + Pattern.quote(word1) + "\\s+" + Pattern.quote(word2) + "(\\s|$)");

Explaination

(\\s|^) matches any whitespace before the first word, or the start of the string

\\s+ matches the whitespace between the words

(\\s|$) matches any whitespace after the second word, or the end of the string

Pattern.quote(...) ensures that any regex special characters in your input strings are properly escapes.

You also need to call find(), not match(). match() will only return true if the whole string matches the pattern.

Complete example

String word1 = "is";
String word2 = "a";
String text = "This is a sample";

String regex =
    "(\\s|^)" + 
    Pattern.quote(word1) +
    "\\s+" +
    Pattern.quote(word2) + 
    "(\\s|$)";

Pattern p = Pattern.compile(regex);
System.out.println(p.matcher(text).find());

Upvotes: 1

toydarian
toydarian

Reputation: 4554

You can concatenate the two words with a whitespace and use that as the regexp. the only thing, you have to do, is to replace "." with "." so the point does not match as any character.

String regexp = " " + word1 + " " + word2 + " ";
regexp = regexp.replaceAll("\\.", "\\\\.");

Upvotes: 0

Related Questions