Reputation: 1
Here is my questions:
I have a very long string with so many values bounded by the different tags. Those values including chinese, english wording and digits.
I wanna to separate by specify pattern. The following is an example: (I want to find a pattern xxxxxx where xxxx is chinese, english, digits or any notation but not include "<" or ">" as those two symbol is for identify the tags)
However, I found some strange for these pattern. The Pattern seems didn't recgonize the first two tag() but the second one
String a = "<f\"number\">4 <f\"number\"><f$n0>14 <h85><f$n0>4 <f$n0>2 <f$n0>2 7 -<f\"Times-Roman\">7<f\"number\">";
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}*\\p{sc=Han}*[a-z]*[A-Z]*[0-9]*^<>]*<f\"number\">");
Matcher m = p.matcher(a);
while(m.find()){
System.out.println(m.group());
}
The output is as same as my String a
Upvotes: 0
Views: 80
Reputation: 91385
The character class [\\P{sc=Han}*\\p{sc=Han}*[a-z]*[A-Z]*[0-9]*^<>]*
matches 0 or more any character because \\P{sc=Han}
and \\p{sc=Han}
are opposite.
I guess you want:
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9]*<f\"number\">");
You may want to add spaces:
Pattern p = Pattern.compile("<f\"number\">[\\P{sc=Han}a-zA-Z0-9\s]*<f\"number\">");
or:
Pattern p = Pattern.compile("<f\"number\">[^<]*<f\"number\">");
Upvotes: 2