Reputation: 5156
I'm practicing to parse XML.
My sentence is
<SINGER>I.O.I</SINGER> came back on <MONTH>May</MONTH> 4, <YEAR>2016</YEAR>.
I used both
Pattern.compile("<[^/^>.]+>[^<^>.]+</[^>.]+>");
and
Pattern.compile("<[^/^>.]+>[^<^>\\..]+</[^>.]+>");
However, the regexes could not match
<SINGER>I.O.I</SINGER>
I think my regexes act weird because of those dots since they could match
<SINGER>I-O-I</SINGER>
What should I do?
Thank you.
Upvotes: 1
Views: 1015
Reputation: 85361
The pattern <[^/^>.]+>[^<^>.]+</[^>.]+>
means:
<
/
^
>
and .
>
<
^
>
.
</
>
.
>
So it won't match <SINGER>I.O.I</SINGER>
You probably want something like <[^>]+>[^<]*</[^>]+>
as a quick-and-dirty way to extract data from an XML tag.
Then you need to use Pattern
and Matcher
properly:
Pattern p = Pattern.compile("<[^>]+>([^<]*)</[^>]+>");
Matcher m = p.matcher("<SINGER>I.O.I</SINGER> came back on <MONTH>May</MONTH> 4, <YEAR>2016</YEAR>.");
while (m.find()) {
System.out.println(m.group(1));
}
Will print:
I.O.I
May
2016
Upvotes: 2
Reputation: 19622
If you want the dot or other characters with a special meaning in regexes to be a normal character, you have to escape it with a backslash. Since regexes in Java are normal Java strings, you need to escape the backslash itself, so you need two backslashes e.g. \\.
Upvotes: 0