Reputation: 81
I'm trying to remove all spaces from lines in a block of text which contain nothing but spaces, leaving the line breaks in place.
I tried the following:
str = " text\n \n \n text";
str = str
.replaceAll("\\A +\\n", "\n")
.replaceAll("(\\n +\\n)", "\n\n")
.replaceAll("\\n +\\Z", "\n");
I was expecting the output to be
" text\n\n\n text"
but instead it was
" text\n\n \n text"
The space in the third line of the block had not been removed. What am I doing wrong here?
Upvotes: 1
Views: 1478
Reputation: 626845
You need to match lines with horizontal spaces only and the Pattern.MULTILINE
modifier is required for the ^
and $
anchors to match start and end of lines respectively (its embedded option is (?m)
). Use
String str = " text\n \n \n text";
str = str.replaceAll("(?m)^[\\p{Zs}\t]+$", "");
See the Java demo.
Details:
(?m)
- Multiline mode^
- start of line[\\p{Zs}\t]+
- 1 or more horizontal whitespaces$
- end of line.An alternative to [\p{Zs}\t]
is a pattern to match any whitespace excluding vertical whitespace symbols. In Java, character class subtraction can be handy: [\s&&[^\r\n]]
where [\s]
matches any whitespace and &&[^\r\n]
excludes a carriage return and newline characters from it. A full pattern would look like .replaceAll("(?Um)^[\\s&&[^\r\n]]+$", "")
.
Upvotes: 2
Reputation: 89557
Use anchors:
str = str.replaceAll("(?m)^[^\\S\\n]+$", "");
Where ^
and $
match respectively the start and the end of a line when the multiline flag (?m)
is switched on.
The problem with your pattern is that you use \\n
around the horizontal whitespaces replaceAll("(\\n +\\n)", "\n\n")
(simple spaces in your pattern). If you do that you can't obtain contiguous results since you can't match the same character twice.
Note: add eventually \\r
in the character class (to exclude it as \\n
) if you want to take in account Windows or old Mac end of lines.
Upvotes: 1
Reputation: 33476
Use the MULTILINE
flag, so that ^
and $
will match the beginning and end of each line. The problem with your regex is that it is capturing the newline character, so the next match will advance past it, and cannot match.
str.replaceAll("(?m)^ +$", "")
Upvotes: 3