DickDickSean
DickDickSean

Reputation: 1

Grouping regular expression

Here is my questions:

I have a very long string with so many values bounded by the different tags. Those values including chinese, english wording and digits.

I wanna to separate by specify pattern. The following is an example: (I want to find a pattern xxxxxx where xxxx is chinese, english, digits or any notation but not include "<" or ">" as those two symbol is for identify the tags)

However, I found some strange for these pattern. The Pattern seems didn't recgonize the first two tag() but the second one

String a = "<f\"number\">4  <f\"number\"><f$n0>14   <h85><f$n0>4    <f$n0>2 <f$n0>2 7   -<f\"Times-Roman\">7<f\"number\">";
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}*\\p{sc=Han}*[a-z]*[A-Z]*[0-9]*^<>]*<f\"number\">");
Matcher m = p.matcher(a);

while(m.find()){
    System.out.println(m.group());
}

The output is as same as my String a

Upvotes: 0

Views: 80

Answers (1)

Toto
Toto

Reputation: 91385

The character class [\\P{sc=Han}*\\p{sc=Han}*[a-z]*[A-Z]*[0-9]*^<>]* matches 0 or more any character because \\P{sc=Han} and \\p{sc=Han} are opposite.

I guess you want:

Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9]*<f\"number\">");

You may want to add spaces:

Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9\s]*<f\"number\">");

or:

Pattern p = Pattern.compile("<f\"number\">[^<]*<f\"number\">");

Upvotes: 2

Related Questions