Jim
Jim

Reputation: 4425

Can not replace multiple empty lines with one

Why does the following not replace multiple empty lines with one?

$ cat some_random_text.txt  
foo   



bar   




test  

and this does not work:

$ cat some_random_text.txt | perl -pe "s/\n+/\n/g"
foo  



bar  





test  

I am trying to replace the multiple new lines (i.e. empty lines) to a single empty new line but the regex I use for that does not work as you can see in the example snippet.
What am I messing up?

Expected outcome is:

foo

bar

test

Upvotes: 1

Views: 312

Answers (3)

ikegami
ikegami

Reputation: 386646

You are executing the following program:

LINE: while (<>) {
   s/\n+/\n/g;
}
continue {
   die "-p destination: $!\n" unless print $_;
}

Since you are reading one line at at time, and since a line is a sequence of characters that aren't line feeds terminated by a line feed, your pattern will never match more than one newline.


The simple fix is to tell Perl to treat the entire file as one line. Also, you don't want to replace every line feed, but just those found in sequence of two or more, and you want to replace the sequence with two line feeds.

perl -0777pe's/\n\n\K\n+//g; s^\n+//; s/\n\K\n\z//' some_random_text.txt

The second and third substitutions ensure there are no blank lines at the start and end of the file.


While reading the entire file into memory is easy, it's not necessary. The desired output can also be achieved by maintaining a flag that indicates whether the previous line was blank or not.

perl -ne'if (/\S/) { print "\n" if $f; print; $f=0 } else { $f=1 }' some_random_text.txt

This solution also removes blank lines from the start and end of the file.

Upvotes: 2

dawg
dawg

Reputation: 104092

Given:

$ echo "$txt"
foo   



bar   




test  

You can use sed to reduce the runs of blank lines to a single \n:

$ echo "$txt" | sed '/^$/N;/^\n$/D'
foo   

bar   

test  

Even easier, you can use cat -s:

$ echo "$txt" | cat -s            # same output 

In perl the idiomatic 1 liner is to use -00 for paragraph mode:

$ echo "$txt" | perl -00pe0       # same output 

And in awk you have the flexibility of using paragraph mode by setting RS= and then set ORS= to what you want the replacement for runs of \n to be:

$ echo "$txt" | awk '1' RS= ORS="\n\n"    # same output 

ikegami correctly states that printf 'a\n\n' | ... will produce two trailing spaces with these solutions. That may or may not be an issue.

Upvotes: 1

melpomene
melpomene

Reputation: 85887

The reason it doesn't work is that -p tells perl to process the input line by line, and there's never more than one \n in a single line.

Better idea:

perl -00 -lpe 1
  • -00: Enable paragraph mode (input records are terminated by any sequence of 2+ newlines).
  • -l: Enable autochomp mode (the input record separators are trimmed automatically, so since we're in paragraph mode, all trailing newlines are removed, and output records get "\n\n" added).
  • -p: Enable automatic input/output (the main code is executed for each input record; anything left in $_ is printed automatically).
  • -e 1: Use a dummy main program that does nothing.

Taken all together this does nothing except normalize paragraph terminators to exactly two newlines.

Upvotes: 6

Related Questions