ghchoi
ghchoi

Reputation: 5156

Java regex dot does not match the real dot character (.)

I'm practicing to parse XML.

My sentence is

<SINGER>I.O.I</SINGER> came back on <MONTH>May</MONTH> 4, <YEAR>2016</YEAR>.

I used both

Pattern.compile("<[^/^>.]+>[^<^>.]+</[^>.]+>");

and

Pattern.compile("<[^/^>.]+>[^<^>\\..]+</[^>.]+>");

However, the regexes could not match

<SINGER>I.O.I</SINGER>

I think my regexes act weird because of those dots since they could match

<SINGER>I-O-I</SINGER>

What should I do?

Thank you.

Upvotes: 1

Views: 1015

Answers (2)

rustyx
rustyx

Reputation: 85361

The pattern <[^/^>.]+>[^<^>.]+</[^>.]+> means:

  1. <
  2. One or more characters except / ^ > and .
  3. >
  4. One or more characters except < ^ > .
  5. </
  6. One or more characters except > .
  7. >

So it won't match <SINGER>I.O.I</SINGER>

You probably want something like <[^>]+>[^<]*</[^>]+> as a quick-and-dirty way to extract data from an XML tag.

Then you need to use Pattern and Matcher properly:

    Pattern p = Pattern.compile("<[^>]+>([^<]*)</[^>]+>");
    Matcher m = p.matcher("<SINGER>I.O.I</SINGER> came back on <MONTH>May</MONTH> 4, <YEAR>2016</YEAR>.");
    while (m.find()) {
        System.out.println(m.group(1));
    }

Will print:

I.O.I
May
2016

Upvotes: 2

Rahul Singh
Rahul Singh

Reputation: 19622

If you want the dot or other characters with a special meaning in regexes to be a normal character, you have to escape it with a backslash. Since regexes in Java are normal Java strings, you need to escape the backslash itself, so you need two backslashes e.g. \\.

Upvotes: 0

Related Questions