Mahendran
Mahendran

Reputation: 1

Composing an regular expression in java

This is regarding composing an regular expression to satisfy the given conditions.

The conditions are:

  1. I wanted to return true/false if a particular word is present in the paragraph.
  2. The word can be anywhere (in the beginning, middle, or end)
  3. It should return only for whole words with an exception. The word can precede (or) follow by only one special character such as ,.;()[]{} etc
  4. Also it is case insensitive search.

In the below code I am searching for a word Positive. I have hardcoded the string in the regex. Ideally in this case the output should be false, but it is returning true. So I am not sure how to do this.

String inputStr = "ssdf Positiveasd asd sdfewrewr asd";  
inputStr = inputStr.toUpperCase();  

String patternStr = "[^a-z]*[\\s]?[^\\d\\w]?[POSITIVE\b]+[^a-z]*";  
Pattern pattern = Pattern.compile(patternStr);  

Matcher matcher = pattern.matcher(inputStr);  
boolean matchFound = matcher.matches();  

System.out.println(matchFound);  

Upvotes: 0

Views: 679

Answers (7)

stema
stema

Reputation: 92976

  1. You need double escaping, so \b should become \\b

  2. Don't put "POSITIVE" into square brackets, this creates a character class, means match any of the included characters.

    replace [POSITIVE\b]+ with POSITIVE\b

When I understand your requirements correctly then you should only need (?i)\\bpositive\\b

The (?i) makes your inputStr.toUpperCase() unnecessary, because it makes the match case independent. The \\b is a word boundary means it is true if there is no word character before and no word character behind your word "positive".

Test Code

String s1 = "ssdf Positiveasd asd sdfewrewr asd";
String s2 = "ssdf Positive asd asd sdfewrewr asd";
String s3 = "ssdf poSiTive asd sdfewrewr asd";
String s4 = "ssdf FooPositive asd sdfewrewr asd";

String[] s = { s1, s2, s3, s4 };
String regex = "(?i)\\bpositive\\b";

for(String a : s) {
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(a);
    
    if (matcher.find())
        System.out.println(a + " ==> Success");
    else
        System.out.println(a + " ==> Failure");
}

Output

ssdf Positiveasd asd sdfewrewr asd ==> Failure
ssdf Positive asd asd sdfewrewr asd ==> Success
ssdf poSiTive asd sdfewrewr asd ==> Success
ssdf FooPositive asd sdfewrewr asd ==> Failure

Upvotes: 1

EMS
EMS

Reputation: 188

If I'm understanding you, you want to match things like

Positive; blah
Positive blah
blah Positive blah

But not things like your example string or

Positive;; blah
;Positive

Is that right? If so, I feel like you're overcomplicating things a bit with your expression...

How about something like this?

String patternStr = "[^\\s]+POSITIVE[\\b]?[$\\s]*";
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();

Also, you'd probably want to make sure that your definition of "special character" is the same as what's meant by the \b word boundary.

Upvotes: 1

aroth
aroth

Reputation: 54806

It might be simpler to do something like:

public boolean doesInputContainWord(String inputStr, String word) {
    inputStr = inputStr.toLowerCase().replaceAll("[^a-z]", " ");
    word = " " + word.toLowerCase() + " ";
    return inputStr.contains(word);
}

This replaces every character in the input string that is not a letter with a space, and then checks to see if the transform text contains word. Note that that is <space> + <word> + <space>.

Or, if you really want to use a regex to do the matching, then I would suggest removing the [] around "POSITIVE", as well as the \b and the + that comes after. The brackets are defining a character class, which is not what you want in this case. You want to look for the literal text "POSITIVE". The [POSITIVE]+ would match things like "OOST" and "VIVE" and pretty much any string that contains one or more letters from the word "Positive".

Upvotes: 0

Konstantin Pribluda
Konstantin Pribluda

Reputation: 12367

(\bPOSITIVE\b) 

shall do the trick (says my cool regex debugger). Square brackets define character class, and round brackets pattern ( do not forget to double \ in java string literal )

Upvotes: 1

joelkema
joelkema

Reputation: 19

You could also use

if(inputStr.indexOf("Positive") > 0){
   //Word is found
}

Upvotes: -1

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

One of your problems is that \b means "backspace character" (which obviously isn't present in the string you're trying to match).

You want \\b (which the regex engine receives after string processing is done as \b (word boundary)). Don't forget that you need to escape backslashes in a Java string.

I would have constructed the regex much differently, though. However, I don't understand what you mean by your requirement no. 3. Could you provide a few examples to illustrate this?

Upvotes: 1

duffymo
duffymo

Reputation: 308743

Try removing the word boundary \b and see if it returns true.

Upvotes: 0

Related Questions