Reputation: 323
I'm formatting a very large amount of plaintext files using java, and I need to remove all punctuation except for apostrophes. When I originally had set up the regex for the replaceAll
statement, it worked to get rid of everything that I knew of, except now I've found one particular file/punctuation set that it's not working in.
holdMe = holdMe.replaceAll("[,_\"-.!?:;)(}{]", " ");
I know I'm hitting this statement because all of the other punctuation clears, there's no periods, commas, etcetera. I've tried escaping out the () and {} characters, but it still doesn't get replaced on those characters. I've been trying to teach myself regex using the Oracle documentation, but I can't seem to understand why this isn't working.
Upvotes: 1
Views: 4723
Reputation: 1776
check this:
public static void main(String[] args) {
/* use \\ (double) before { } [ ] */
String m = "this:{[]}/; is a test".replaceAll("[\\[\\]\\{\\}\\/,_\"-.!?:;)(]", " ");
System.out.println(m);
}
Output:
this is a test
Upvotes: 1
Reputation: 4887
This regex will mark every punctuation except Apostrophes
[\p{P}&&[^\u0027]]
The java-string of the regex:
"[\\p{P}&&[^\u0027]]"
Upvotes: 6
Reputation: 420
Instead of specifically specifying every single character you want removed - why not do the opposite, and state which you want to allow, and prefix it with a not?
holdMe = holdMe.replaceAll("[^a-zA-Z0-9'\\s]+"," ");
The above will replace everything other than white spaces, alphanumeric characters, and apostrophes with " "
.
Upvotes: 1