budi
budi

Reputation: 6551

Regex add space between all punctuation

I need to add spaces between all punctuation in a string.

\\ "Hello: World." -> "Hello : World ."
\\ "It's 9:00?"    -> "It ' s 9 : 00 ?"
\\ "1.B,3.D!"      -> "1 . B , 3 . D !"

I think a regex is the way to go, matching all non-punctuation [a-ZA-Z\\d]+, adding a space before and/or after, then extracting the remainder matching all punctuation [^a-ZA-Z\\d]+.

But I don't know how to (recursively?) call this regex. Looking at the first example, the regex will only match the "Hello". I was thinking of just building a new string by continuously removing and appending the first instance of the matched regex, while the original string is not empty.

private String addSpacesBeforePunctuation(String s) {
    StringBuilder builder = new StringBuilder();
    final String nonpunctuation = "[a-zA-Z\\d]+";
    final String punctuation = "[^a-zA-Z\\d]+";

    String found;
    while (!s.isEmpty()) {

        // regex stuff goes here

        found = ???; // found group from respective regex goes here
        builder.append(found);
        builder.append(" ");
        s = s.replaceFirst(found, "");
    }

    return builder.toString().trim();
}

However this doesn't feel like the right way to go... I think I'm over complicating things...

Upvotes: 3

Views: 1183

Answers (2)

anubhava
anubhava

Reputation: 785481

You can use lookarounds based regex using punctuation property \p{Punct} in Java:

str = str.replaceAll("(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)", " ");
  • (?<=\\S) Asserts if prev char is not a white-space
  • (?<=\\p{Punct}) asserts a position if previous char is a punctuation char
  • (?=\\p{Punct}) asserts a position if next char is a punctuation char
  • (?=\\S) Asserts if next char is not a white-space

IdeOne Demo

Upvotes: 5

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726809

When you see a punctuation mark, you have four possibilities:

  1. Punctuation is surrounded by spaces
  2. Punctuation is preceded by a space
  3. Punctuation is followed by a space
  4. Punctuation is neither preceded nor followed by a space.

Here is code that does the replacement properly:

String ss = s
    .replaceAll("(?<=\\S)\\p{Punct}", " $0")
    .replaceAll("\\p{Punct}(?=\\S)", "$0 ");

It uses two expressions - one matching the number 2, and one matching the number 3. Since the expressions are applied on top of each other, they take care of the number 4 as well. The number 1 requires no change.

Demo.

Upvotes: 2

Related Questions