Ionut
Ionut

Reputation: 2858

Java unicode regex not matching german characters

This question is based on this question.

I am using \P{M}\p{M}* in order to match all letters (both from German and French language).

I chose this regex in order to avoid defining every unicode character such as: ^[a-zA-Z[\\u00c0-\\u01ff]]+[\\']?(([-]?[a-zA-Z[\\u00c0-\\u01ff]]*[\\s]?)|([\\s]?[a-zA-Z[\\u00c0-\\u01ff]]*[-]?)){1,2}[a-zA-Z[\\u00c0-\\u01ff]]+$

However, despite using the unicode format defined in the previous question, characters such as ß or è are not matched by the regex.

I am using JDK 6.

What am I missing. Thanks!

Upvotes: 3

Views: 2415

Answers (2)

Bohemian
Bohemian

Reputation: 425073

Use the posix character class \p{L} for "any letter":

System.out.println("abcßè".matches("\\p{L}+")); // true

Upvotes: 3

Antoine Wils
Antoine Wils

Reputation: 349

using java 6 this code

 public static void main(String[] args) {
       String str = "hello ß you";
       Pattern p = Pattern.compile("(:?\\P{M}\\p{M}*)+");
       Matcher matcher = p.matcher(str);
       System.out.println("replaced: '" + matcher.replaceAll("") + "'");
}

returns: replaced: ''

The 'ß' is matched

Upvotes: 0

Related Questions