Reputation:
In Java, is there a simple way to extract a substring by specifying the regular expression delimiters on either side, without including the delimiters in the final substring?
For example, if I have a string like this:
<row><column>Header text</column></row>
what is the easiest way to extract the substring:
Header text
Please note that the substring may contain line breaks...
thanks!
Upvotes: 11
Views: 24140
Reputation: 75426
You should not use regular expressions to decode XML - this will eventually break if the input is not strictly controlled.
The easiest thing is probably to parse the XML up in a DOM tree (Java 1.4 and newer contain a XML parser directly) and then navigate the tree to pick out what you need.
Perhaps you would like to tell what you want to accomplish with your program?
Upvotes: 2
Reputation: 123020
Write a regex like this:
"(regex1)(.*)(regex2)"
... and pull out the middle group from the matcher (to handle newlines in your pattern you want to use Pattern.DOTALL).
Using your example we can write a program like:
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
Pattern p = Pattern.compile(
"<row><column>(.*)</column></row>",
Pattern.DOTALL
);
Matcher matcher = p.matcher(
"<row><column>Header\n\n\ntext</column></row>"
);
if(matcher.matches()){
System.out.println(matcher.group(1));
}
}
}
Which when run prints out:
Header
text
Upvotes: 24