Reputation: 11066
I'm trying to parse a string for any occurrences of markdown style links, i.e. [text](link)
. I'm able to get the first of the links in a string, but if I have multiple links I can't access the rest. Here is what I've tried, you can run it on ideone:
Pattern p;
try {
p = Pattern.compile("[^\\[]*\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)(?:.*)");
} catch (PatternSyntaxException ex) {
System.out.println(ex);
throw(ex);
}
Matcher m1 = p.matcher("Hello");
Matcher m2 = p.matcher("Hello [world](ladies)");
Matcher m3 = p.matcher("Well, [this](that) has [two](too many) keys.");
System.out.println("m1 matches: " + m1.matches()); // false
System.out.println("m2 matches: " + m2.matches()); // true
System.out.println("m3 matches: " + m3.matches()); // true
System.out.println("m2 text: " + m2.group("text")); // world
System.out.println("m2 link: " + m2.group("link")); // ladies
System.out.println("m3 text: " + m3.group("text")); // this
System.out.println("m3 link: " + m3.group("link")); // that
System.out.println("m3 end: " + m3.end()); // 44 - I want 18
System.out.println("m3 count: " + m3.groupCount()); // 2 - I want 4
System.out.println("m3 find: " + m3.find()); // false - I want true
I know I can't have repeating groups, but I figured find
would work, however it does not work as I expected it to. How can I modify my approach so that I can parse each link?
Upvotes: 2
Views: 1256
Reputation: 2306
Can't you go through the matches one by one and do the next match from an index after the previous match? You can use this regex:
\[(?<text>[^\]]*)\]\((?<link>[^\)]*)\)
The method Find()
tries to find all matches even if the match is a substring of the entire string. Each call to find gets the next match. Matches()
tries to match the entire string and fails if it doesn't match. Use something like this:
while (m.find()) {
String s = m.group(1);
// s now contains "BAR"
}
Upvotes: 1
Reputation: 31045
The regular expression I've used to match what you need (without groups) is \[\w+\]\(.+\)
It is just to show you it simple. Basically it does:
\[
\w+
\]
This will look for these pattern [blabla]
Then the same with parenthesis...
\(
.+
\)
So it filters (ble...ble...)
Now if you want to store the matches on groups you can use additional parenthesis like this:
(\[\w+\])(\(.+\))
in this way you can have stored the words and links.
Hope to help.
I've tried on regexplanet.com and it's working
Update: workaround .*(\[\w+\])(\(.+\))*.*
Upvotes: 0