Zach
Zach

Reputation: 5083

Java Regex, capturing items inside of "[...]"

I am trying to capture text inside of XML tags like ... and the content inside of strings like "[[A]]" that would be inside of the XML tags. So far my patterns are as follows:

    Pattern titleText = Pattern.compile("<title>([A-Z])</title>");
    Pattern extractLink = Pattern.compile("(\[\[([A-Z])\]\])");

I'm getting an error on the second pattern, and it's because of the \s. However, I'm not sure how to let Regex know that I want to escape the [s and ]s so it captures the text inside of them.

An example of input I am trying to capture is:

<title>random text [[A]] more random text [[B]] ...</title>

Where [[A]] and [[B]] can happen any number of times, and I am trying to find all of them.

Any help/advice would be greatly appreciated.

Upvotes: 1

Views: 62

Answers (2)

akh
akh

Reputation: 1

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class TestTag {

    public static void main(String[] args) {
        String INPUT = "<title>random text [[ABBA]] more random text [[B]] ...</title>";
        String REGEX = "(\\[\\[\\S*]])";

        Pattern p = Pattern.compile(REGEX);
        Matcher m = p.matcher(INPUT);

        while (m.find()) {
        System.out.println(" data: "
            + INPUT.substring(m.start() + 2, m.end() - 2));
        }

    }
}

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521997

You can't extract a regex group in Java an arbitrary number of times without specifying each one in the pattern. However, here is an alternative solution which splits the String on the bracketed item you want to match:

Pattern titleText = Pattern.compile("<title>(.*?)</title>");
String input = "<title>random text [[A]] more random text [[B]] ...</title>";
String text = "";

Matcher m = titleText.matcher(input);
if (m.find( )) {
    text = m.group(1);
}

String[] parts = text.split("\\[\\[");

for (int i=1; i < parts.length; ++i) {
    int index = parts[i].indexOf("]]");
    String match = parts[i].substring(0, index);
    System.out.println("Found a match: " + match);
}

Output:

Found a match: A
Found a match: B

Upvotes: 1

Related Questions