loic
loic

Reputation: 35

Java regular expression find substring

I'm trying to find a specific word into a string in Java. I developed a function aims to return the found string. THis is what I wrotte for now:

public static String getValueByregexExpr (String str, String regexExpr) {
    Pattern regex = Pattern.compile (regexExpr, Pattern.DOTALL);
    Matcher matcher1 = regex.matcher (str);
    if (matcher1.find ()) {
        if (matcher1.groupCount () != 0 && matcher1.group (1) != null) {
            for (int i = 0; i <= matcher1.groupCount (); i++) {
                System.out.println ("matcher " + i + " for regex " + regexExpr + "= " + matcher1.group (i));
            }
            return matcher1.group (1);
        }
        return regexExpr;
    }
    return null;
}

My issue is the following, I wish to find a regex able to fill in the group(1) with the word I'm looking for. But for now this code:

public static void main (String[] args) {

    String str = "HELLO_WORLD_123456 TEst";

    System.out.println ("First test");
    String regex1 = ".*WORLD.*";
    String matchedString = Util.getValueByregexExpr (str, regex1);
    //Here, I want to obtain matchedString = WORLD
    if (matchedString == null) {
        System.out.println ("matchedString null");
    } else if (matchedString.equals (regex1)) {
        System.out.println ("String found but empty group(1)");
    } else {
        System.out.println ("Result : " + matchedString);
    }

    //Here, I want to obtain matchedString = WORLD_123456
    System.out.println ("\nSecond test");
    String regex2 = "WORLD_([^_]+)";
    matchedString = Util.getValueByregexExpr (str, regex2);
    if (matchedString == null) {
        System.out.println ("regex " + regex2 + " matchedString null");
    } else if (matchedString == regex2) {
        System.out.println ("regex " + regex2 + " String found but empty group(1)");
    } else {
        System.out.println ("regex " + regex2 + " Result : " + matchedString);
    }

}

Give me the output:

First test:
regex .*WORLD.* String found but empty group(1)

Second test:
matcher 0 for regex WORLD_([^_]+)= WORLD_123456
matcher 1 for regex WORLD_([^_]+)= 123456
regex WORLD_([^_]+) Result : 123456

First, is there any regular expression whitch can returns: - WORLD for the first test - WORLD_123456 for the second test

Second, I thought at the begining that every result will be set into group(1) as long as you have only one result. But I'm obviously wrong given test 2 result. Could someone give me more information about it?

Thank you for your help.

Upvotes: 2

Views: 173

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627119

To fix the first one, just add the capturing group:

String regex1 = ".*(WORLD).*";

To fix the second, add whitespace to the character class:

String regex2 = "(WORLD_[^_\\s]+)";

See demo

The main reason your first part of code was not working as expected is the missing capturing groups that your getValueByregexExpr is checking for. The second one returned the part of stirng captured with the ([^_]+) regex part.

Upvotes: 1

Shrinivas Shukla
Shrinivas Shukla

Reputation: 4463

In regex, everything inside () becomes a group.

Correct you regex.

String regex1 = ".*(WORLD).*";


String regex2 = "(WORLD_[^_\\s]+)";

Upvotes: 0

Related Questions