Heather T
Heather T

Reputation: 323

Java regex to remove specific punctuation

I'm formatting a very large amount of plaintext files using java, and I need to remove all punctuation except for apostrophes. When I originally had set up the regex for the replaceAll statement, it worked to get rid of everything that I knew of, except now I've found one particular file/punctuation set that it's not working in.

    holdMe = holdMe.replaceAll("[,_\"-.!?:;)(}{]", " ");

I know I'm hitting this statement because all of the other punctuation clears, there's no periods, commas, etcetera. I've tried escaping out the () and {} characters, but it still doesn't get replaced on those characters. I've been trying to teach myself regex using the Oracle documentation, but I can't seem to understand why this isn't working.

Upvotes: 1

Views: 4723

Answers (3)

Soley
Soley

Reputation: 1776

check this:

public static void main(String[] args) {
        /* use \\ (double) before { } [ ] */
        String m = "this:{[]}/; is a test".replaceAll("[\\[\\]\\{\\}\\/,_\"-.!?:;)(]", " ");
        System.out.println(m);
    }

Output:

this        is a test

Upvotes: 1

Andie2302
Andie2302

Reputation: 4887

This regex will mark every punctuation except Apostrophes

[\p{P}&&[^\u0027]]

The java-string of the regex:

"[\\p{P}&&[^\u0027]]"

Upvotes: 6

Othya
Othya

Reputation: 420

Instead of specifically specifying every single character you want removed - why not do the opposite, and state which you want to allow, and prefix it with a not?

holdMe = holdMe.replaceAll("[^a-zA-Z0-9'\\s]+"," ");

The above will replace everything other than white spaces, alphanumeric characters, and apostrophes with " ".

Upvotes: 1

Related Questions