Reputation: 155
I'm asked to catch any html tag using regular expression:
A. <TAG ATTRIBUTE="VALUE"/> or
B. <TAG ATTRIBUTE="VALUE"> or
C. <TAG/> or
D. <TAG> or
E. </TAG>
Here is my pattern:
/** A pattern that matches a simple HTML markup. Group 1 matches
* the initial '/', if present. Group 2 matches the tag. Group
* 3 matches the attribute name, if present. Group 4 matches the
* attribute value (without quotes). Group 5 matches the closing
* '/', if present. */
public static final String HTML_P3 =
"<(/)?\\s*([a-zA-Z]+)\\s*([a-zA-Z]+)?\\s*=?\\s*\\\"?([^\\\"]+)?\\\"?\\s*(/)?>";
Here is a snippet of the test given:
public static void p3(String name, String markup) throws IOException {
out.println("Problem #3.");
Scanner inp = new Scanner(new FileReader(name));
while (inp.findWithinHorizon(markup, 0) != null) {
MatchResult mat = inp.match();
if (mat.group(1) != null
&& (mat.group(5) != null || mat.group(3) != null)) {
out.printf("Bad markup.%n");
continue;
}
out.printf("Tag: %s", mat.group(2));
if (mat.group(3) != null) {
out.printf(", Attribute: %s, Value: \"%s\"",
mat.group(3), mat.group(4));
}
if (mat.group(5) != null || mat.group(1) != null) {
out.print(" end");
}
out.println();
}
out.println();
}
Here is the input:
This is a simple <i>mark-up</i>. Next comes
one <input value="3"/> that's closed,
followed by a list of names:
<ol color="green">
<li> Tom </li>
<li > Dick </li>
<li> Harry </li>
</ol>
The correct answer should be:
Problem #3.
Tag: i
Tag: i end
Tag: input, Attribute: value, Value: "3" end
Tag: ol, Attribute: color, Value: "green"
Tag: li
Tag: li end
Tag: li
Tag: li end
Tag: li
Tag: li end
Tag: ol end
However, I can never catch any ending tag, and here is my output:
Problem #3.
Tag: i
Tag: input, Attribute: value, Value: "3" end
Tag: ol, Attribute: color, Value: "green"
Tag: li
I've tried using regexpal.com and my pattern matches everything. Can someone shed some lights please?
Upvotes: 2
Views: 159
Reputation: 89547
First at all, since you are trying to write a regex pattern for java, use a java regex tester.
I'm not a java expert, but i'm not sure you need to triple escape the double quotes.
One of the problems in your pattern is that you use successive question marks: ([a-zA-Z]+)?\\s*=?\\s*\"?([^\"]+)?\"?
instead of grouping all in a non capturing group:
(?:([a-zA-Z]+)\\s*=\\s*\"([^\"]+)\")?
(if there is no attribute, then there is no equal, no quotes, no value too)
You can try this: (written as java string)
"(?i)<(/)?([a-z1-6]+)(?:\\s+([a-z]+)\\s*=\\s*\"([^\"]*+)\"\\s*)?(/)?>"
Upvotes: 1