anthony
anthony

Reputation: 7733

Java: how to replace multiple carriage return by only one?

I would like to clean the user comment:

Example:

"Hello guys,
it's my example,



to try to clean


my comment
"

And I would like:

"Hello guys,
it's my example,

to try to clean

my comment"

I tried with s.replaceAll("(?:\\n|\\r)", ""); but it doesn't work for my first case.

Thank you very much for your help !

Upvotes: 2

Views: 2693

Answers (3)

Andreas
Andreas

Reputation: 159086

Including the comment you left for another answer, you want 3 things to happen:

  • 3 or more linebreaks should be reduced to 2, leaving at most one blank line.

  • All linebreaks at the end of text should be removed.

  • Spaces at the end of lines should be removed.

If you want all of that in a single regex, here it is:

replaceAll("(?:\\R|\\s)+$|[ \t]*(\\R)[ \t]*(\\R)(?:[ \t]*\\R)+", "$1$2")

The question uses the phrase "carriage return", which in Java is the \r character, but the sample code indicates that it actually means "line separator", or "linebreak" as it's called in the regex documentation, which is the \R regex pattern:

Any Unicode linebreak sequence, is equivalent to
\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

First part of the regex above ((?:\\R|\\s)+$) is for eliminating all (+) linebreaks (\R) and/or whitespace characters (\s) at the end of input ($).

The second part uses the subpattern [ \t]*\\R 3 times. The subpattern matches a linebreak and all immediately preceding spaces.

To match a subpattern 3 or more times, you would normally use X{3,}, but we want to captur the first two linebreaks, so we can retain them, without knowing what kind of linebreak they are (e.g. Windows vs Linux), we we instead write the subpattern twice, with capture, then match 1 or more after that.

Finally we replace that with the two captured linebreaks. If the first part of the pattern matches, then it's replace with nothing, i.e. it's removed. If the second part matches, it's replaced with the first two matched linebreak, i.e. those linebreasks are retained.

Upvotes: 1

Harmlezz
Harmlezz

Reputation: 8068

Yet another solution which fulfills your requirements:

public static void main(String[] args) {
    String str =
            "\"Hello guys,\n" +
            "it's my example,\n" +
            "\n\r" +
            "\n" +
            "\n\r" +
            "to try to clean\n" +
            "\n\r" +
            "\n" +
            "my comment\n" +
            "\"";
    System.out.println("Before\n\n" + str);
    System.out.println("\n\nAfter:\n\n" + str
            .replaceAll("(\n|\n\r){3,}", "\n\n")
            .replaceAll("(\n|\n\r)+\"$", "\""));
}

Output

Before

"Hello guys,
it's my example,



to try to clean


my comment
"


After:

"Hello guys,
it's my example,

to try to clean

my comment"

Upvotes: -2

Andremoniy
Andremoniy

Reputation: 34900

This should be quite easy:

s.replaceAll("[\n\r]{2,}","\n\n")

it replaces all sequential carriage returns (equal or greater than 2) into two carriage returns.

UPDATE: @John Bollinger has pointed on very good thing: "...this approach will convert single Windows-style line terminators to double Unix-style line terminators..."

So probably more better and more general approach will be:

s.replaceAll("(\n{2,})|(\r{2,})|((\r\n){2,})","\n\n")

UPDATE-2: To remove leading carriage returns also perform: .replaceAll("[\n\r]+$","")

Upvotes: 7

Related Questions