user3691943
user3691943

Reputation: 37

Regex and lookahead : java

I'm trying to remove punctuation except dots (to keep the sentence structure) from a String with regex Actually, i have no clue how it's working, i just code this :

public static String removePunctuation(String s){       
s = s.replaceAll("(?!.)\\p{Punct}" , " ");      
return s;
}

I found that we could use "negative lookahead" for this kind of problem, but when i run this code, it doesn't erase anything. The negative lookahead cancelled the \p{Punct} regex.

Upvotes: 1

Views: 94

Answers (2)

p.s.w.g
p.s.w.g

Reputation: 149030

The . character has special meaning in regular expressions. It essentially means 'any character except new lines' (unless the DOTALL flag is specified, in which case it means 'any character'), so your pattern will match 'any punctuation character that is a new line character—in other words, it never match anything.

If you want it to mean a literal . character, you need to escape it like this:

s = s.replaceAll("(?!\\.)\\p{Punct}" , " ");      

Or wrap it in a character class, like this:

s = s.replaceAll("(?![.])\\p{Punct}" , " ");      

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336328

The unescaped dot matches anything (except newlines). You need at least

s = s.replaceAll("(?!\\.)\\p{Punct}" , " "); 

but for that sort of thing I'd much rather use a character class (within which the dot is no longer a metacharacter and therefore doesn't need to be escaped):

s = s.replaceAll("[^\\P{Punct}.]" , " ");  

Explanation:

  • [^abc] matches any character that's not an a, b, or c.
  • [^\P{Punct}] matches any character that's "not a not a" punctuation character, effectively matching identically to \p{Punct}.
  • [^\P{Punct}.] therefore matches any character that's a punctuation character except a dot.

Upvotes: 3

Related Questions