Reputation: 53
I have 2 strings:
String 1 from txt file, open with BufferedReader use encoding "UTF-8":
Tân_Dậu 1921 – Kỉ_Mão 1999
String 2 is my type in:
Tân_Dậu 1921 - Kỉ_Mão 1999
and my string Pattern:
[(]?([A-ZTĐẤ][a-záââậầấẹịỉìíợnọúùửỵýỷ]+[_][A-ZDĐẤ][a-záậãâậầấẹuịìíợọúùửỵýỷ]+)?[ ]?((\\d{4})|([?]))[ ]?[-][ ]?(([A-ZĐKẤ][a-záâỉoậầấẹịỉìíợọúùửỵýỷ]+[_][A-ZĐẤ][a-záãâậầấãẹịìíợọúùửỵýỷ]+))?[ ]?(\\d{4}|\\d{2}[)])[ ]?[)]?
I use:
Matcher m = p.matcher(test.trim());
while(m.find())
{
System.out.println("-->"+m.group());
}
With 'test' is string 1 and 2 . But only string 2 matched. What problem and how to slove it ? thanks for help.
Upvotes: 3
Views: 758
Reputation: 52185
The problem is the -
. You seem to have two versions of them. Changing your expression to this: [(]?([A-ZTĐẤ][a-záââậầấẹịỉìíợnọúùửỵýỷ]+[_][A-ZDĐẤ][a-záậãâậầấẹuịìíợọúùửỵýỷ]+)?[ ]?((\\d{4})|([?]))[ ]?[-–][ ]?(([A-ZĐKẤ][a-záâỉoậầấẹịỉìíợọúùửỵýỷ]+[_][A-ZĐẤ][a-záãâậầấãẹịìíợọúùửỵýỷ]+))?[ ]?(\\d{4}|\\d{2}[)])[ ]?[)]?
should do the trick (example available here).
Notice how [-]
has been changed to [-–]
.
Upvotes: 3