Andrei Vasilev
Andrei Vasilev

Reputation: 607

How to find a substring between two \n using regex?

I need to locate a person's name in the following string:

 TI35635: 71-3463463409 wa36ued i56tle Ro356 IL
    Involved Subject
     Name: PETER SMITH
     Address: 1 MAIN AVE

So, the rule I should follow is the following: the substring is whatevet goes right after "Subject \n+ Name:" and before "hits \n" I must follow this rule, because some words in the original String (too long) that I did not post could not be unique

I tried the following:

Pattern patternName = Pattern.compile("(?:Subject.?)(\\n)(Name:.*?)\\n", Pattern.DOTALL);
Matcher matcherName = patternName.matcher(text);
matcherName.find();

Please help me correct it

Upvotes: 1

Views: 197

Answers (5)

Reimeus
Reimeus

Reputation: 159874

Just skip the whitespace before attempting to match the group containing the name. You can use \s which will not only match spaces but also newline characters

Pattern patternName = 
           Pattern.compile("(?:Subject.?)\\s+(Name:.*?)\\n", Pattern.DOTALL);

Group 1 contains:

Name: PETER SMITH

Read the Pattern javadoc for a full list of characters matched by \s

Upvotes: 1

Bohemian
Bohemian

Reputation: 425418

You can do it in just one line:

String name = str.replaceAll("(?sm).*Subject\\s+Name:(.*?)?$.*", "$1");

If the name is not found, the result will be blank.

I've also made it so it will work on windows files too.


Here's some test code:

String str = " TI35635: 71-3463463409 wa36ued i56tle Ro356 IL\n    Involved Subject\n     Name: PETER SMITH\n     Address: 1 MAIN AVE";
String name = str.replaceAll("(?sm).*Subject\s+Name:(.*?)?$.*", "$1");
System.out.println("Name = " + name);;

Output:

Name = PETER SMITH

Upvotes: 1

La-comadreja
La-comadreja

Reputation: 5755

You could represent the Regex for the name as:

([ \\t\\x0B\\f\\r]*[a-zA-Z]+)*

This represents a sequence of zero or more of the following: zero or more spaces (non-newlines), followed by one or more letters. Should handle the names within your larger Regex.

Alternatively, \s represents whitespace (although it includes newlines) and \w represents any letter or number character.

Upvotes: 1

Maxim Shoustin
Maxim Shoustin

Reputation: 77930

Your example has 3 groups a.e O(n^3) where n is char number.

Generally regex is good if we want to replace multiple times.

In your case regex is too expensive. (from my view). Iwould use followed example:

String str = "TI35635: 71-3463463409 wa36ued i56tle Ro356 IL\r\n" + 
                "    Involved Subject\r\n" + 
                "     Name: PETER SMITH\r\n" + 
                "     Address: 1 MAIN AVE";

    StringBuilder buff = new StringBuilder();

    for(String line : str.split(System.getProperty("line.separator"))){
        if(line.contains("Name: ")){
            String temp = line.split(": ")[0];
            temp = temp + ": " + "New Name"; 
            buff.append(temp).append("\n");
        }
        else{
            buff.append(line).append("\n");
        }           
    }       

    System.out.println(buff.toString());

Output:

TI35635: 71-3463463409 wa36ued i56tle Ro356 IL
    Involved Subject
     Name: New Name
     Address: 1 MAIN AVE

Upvotes: 1

Paul Vargas
Paul Vargas

Reputation: 42060

You can try the regular expression:

Subject[ ]*\r?\n[ ]*(Name:.*)

e.g.:

private static final Pattern REGEX_PATTERN = 
        Pattern.compile("Subject[ ]*\\r?\\n[ ]*(Name:.*)");

public static void main(String[] args) {
    String input = "TI35635: 71-3463463409 wa36ued i56tle Ro356 IL\n    Involved Subject\n     Name: PETER SMITH\n     Address: 1 MAIN AVE";

    Matcher matcher = REGEX_PATTERN.matcher(input);
    while (matcher.find(1)) {
        System.out.println(matcher.group());
    }
}

Output:

Name: PETER SMITH

Upvotes: 1

Related Questions