Anna
Anna

Reputation:

Java string - get everything between (but not including) two regular expressions?

In Java, is there a simple way to extract a substring by specifying the regular expression delimiters on either side, without including the delimiters in the final substring?

For example, if I have a string like this:

<row><column>Header text</column></row>

what is the easiest way to extract the substring:

Header text

Please note that the substring may contain line breaks...

thanks!

Upvotes: 11

Views: 24140

Answers (2)

You should not use regular expressions to decode XML - this will eventually break if the input is not strictly controlled.

The easiest thing is probably to parse the XML up in a DOM tree (Java 1.4 and newer contain a XML parser directly) and then navigate the tree to pick out what you need.

Perhaps you would like to tell what you want to accomplish with your program?

Upvotes: 2

Aaron Maenpaa
Aaron Maenpaa

Reputation: 123020

Write a regex like this:

"(regex1)(.*)(regex2)"

... and pull out the middle group from the matcher (to handle newlines in your pattern you want to use Pattern.DOTALL).

Using your example we can write a program like:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {

    public static void main(String[] args) {
        Pattern p = Pattern.compile(
                "<row><column>(.*)</column></row>",
                Pattern.DOTALL
            );

        Matcher matcher = p.matcher(
                "<row><column>Header\n\n\ntext</column></row>"
            );

        if(matcher.matches()){
            System.out.println(matcher.group(1));
        }
    }

}

Which when run prints out:

Header


text

Upvotes: 24

Related Questions