Reputation: 4425
Why does the following not replace multiple empty lines with one?
$ cat some_random_text.txt
foo
bar
test
and this does not work:
$ cat some_random_text.txt | perl -pe "s/\n+/\n/g"
foo
bar
test
I am trying to replace the multiple new lines (i.e. empty lines) to a single empty new line but the regex I use for that does not work as you can see in the example snippet.
What am I messing up?
Expected outcome is:
foo
bar
test
Upvotes: 1
Views: 312
Reputation: 386646
You are executing the following program:
LINE: while (<>) {
s/\n+/\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Since you are reading one line at at time, and since a line is a sequence of characters that aren't line feeds terminated by a line feed, your pattern will never match more than one newline.
The simple fix is to tell Perl to treat the entire file as one line. Also, you don't want to replace every line feed, but just those found in sequence of two or more, and you want to replace the sequence with two line feeds.
perl -0777pe's/\n\n\K\n+//g; s^\n+//; s/\n\K\n\z//' some_random_text.txt
The second and third substitutions ensure there are no blank lines at the start and end of the file.
While reading the entire file into memory is easy, it's not necessary. The desired output can also be achieved by maintaining a flag that indicates whether the previous line was blank or not.
perl -ne'if (/\S/) { print "\n" if $f; print; $f=0 } else { $f=1 }' some_random_text.txt
This solution also removes blank lines from the start and end of the file.
Upvotes: 2
Reputation: 104092
Given:
$ echo "$txt"
foo
bar
test
You can use sed
to reduce the runs of blank lines to a single \n
:
$ echo "$txt" | sed '/^$/N;/^\n$/D'
foo
bar
test
Even easier, you can use cat -s
:
$ echo "$txt" | cat -s # same output
In perl
the idiomatic 1 liner is to use -00
for paragraph mode:
$ echo "$txt" | perl -00pe0 # same output
And in awk
you have the flexibility of using paragraph mode by setting RS=
and then set ORS=
to what you want the replacement for runs of \n
to be:
$ echo "$txt" | awk '1' RS= ORS="\n\n" # same output
ikegami correctly states that printf 'a\n\n' | ...
will produce two trailing spaces with these solutions. That may or may not be an issue.
Upvotes: 1
Reputation: 85887
The reason it doesn't work is that -p
tells perl to process the input line by line, and there's never more than one \n
in a single line.
Better idea:
perl -00 -lpe 1
-00
: Enable paragraph mode (input records are terminated by any sequence of 2+ newlines).-l
: Enable autochomp mode (the input record separators are trimmed automatically, so since we're in paragraph mode, all trailing newlines are removed, and output records get "\n\n"
added).-p
: Enable automatic input/output (the main code is executed for each input record; anything left in $_
is printed automatically).-e 1
: Use a dummy main program that does nothing.Taken all together this does nothing except normalize paragraph terminators to exactly two newlines.
Upvotes: 6