Reputation: 5083
I am trying to capture text inside of XML tags like ... and the content inside of strings like "[[A]]" that would be inside of the XML tags. So far my patterns are as follows:
Pattern titleText = Pattern.compile("<title>([A-Z])</title>");
Pattern extractLink = Pattern.compile("(\[\[([A-Z])\]\])");
I'm getting an error on the second pattern, and it's because of the \
s. However, I'm not sure how to let Regex know that I want to escape the [
s and ]
s so it captures the text inside of them.
An example of input I am trying to capture is:
<title>random text [[A]] more random text [[B]] ...</title>
Where [[A]]
and [[B]]
can happen any number of times, and I am trying to find all of them.
Any help/advice would be greatly appreciated.
Upvotes: 1
Views: 62
Reputation: 1
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class TestTag {
public static void main(String[] args) {
String INPUT = "<title>random text [[ABBA]] more random text [[B]] ...</title>";
String REGEX = "(\\[\\[\\S*]])";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while (m.find()) {
System.out.println(" data: "
+ INPUT.substring(m.start() + 2, m.end() - 2));
}
}
}
Upvotes: 0
Reputation: 521997
You can't extract a regex group in Java an arbitrary number of times without specifying each one in the pattern. However, here is an alternative solution which splits the String on the bracketed item you want to match:
Pattern titleText = Pattern.compile("<title>(.*?)</title>");
String input = "<title>random text [[A]] more random text [[B]] ...</title>";
String text = "";
Matcher m = titleText.matcher(input);
if (m.find( )) {
text = m.group(1);
}
String[] parts = text.split("\\[\\[");
for (int i=1; i < parts.length; ++i) {
int index = parts[i].indexOf("]]");
String match = parts[i].substring(0, index);
System.out.println("Found a match: " + match);
}
Output:
Found a match: A
Found a match: B
Upvotes: 1