Reputation: 2799
I wrote a little test to demonstrate
@Test
public void missingPunctuationRegex() {
Pattern punct = Pattern.compile("[\\p{Punct}]");
Matcher m = punct.matcher("'");
assertTrue("ascii puctuation", m.find());
m = punct.matcher("‘");
assertTrue("unicode puctuation", m.find());
}
The first assert passes, and the second one fails. You may have to squint to see it, but that is the 'LEFT SINGLE QUOTATION MARK' (U+2018) and should be covered as a punctuation as far as I can tell.
How would I match ALL punctuations in Java regular expressions?
Upvotes: 6
Views: 5406
Reputation: 111349
You can use the UNICODE_CHARACTER_CLASS
flag to make \p{Punct}
match all Unicode punctuation.
Upvotes: 8
Reputation: 280132
The Javadoc of Pattern
states
\p{Punct}
Punctuation: One of!"#$%&'()*+,-./:;<=>?@[\]^_
{|}~`
You'll have to match it explicitly as it is not considered as part of \p{Punct}
.
Pattern punct = Pattern.compile("[\\p{Punct}‘]");
Upvotes: 2