rapt
rapt

Reputation: 12220

Remove all consecutive blank lines, leaving just one: perl -00 -pe ''

I try to understand the following Perl command to "remove all consecutive blank lines, leaving just one":

perl -00 -pe ''

From Perl One-Liners Explained:

First of all it does not have any code, the -e is empty. Next it has a silly -00 command line option. This command line option turns paragraph slurp mode on. A paragraph is text between two newlines. All the other newlines get ignored. The paragraph gets put in "$_" and the "-p" option prints it out.

I do not follow this explanation. Maybe the wording is not accurate.

So "A paragraph is text between two newlines." But every line is text between two newlines.

"All the other newlines get ignored." But there are no newlines between two successive newlines.

"The paragraph gets put in "$_" and the "-p" option prints it out." Since it does it to the text between every two newlines, that would cram together the whole file into one long line. How does it look like what they say this command is supposed to do?

It also says that an alternative way to write it is

perl -00pe0

What does the rightmost 0 represent?

Anyway, What I actually want to achieve is to remove all consecutive white lines, leaving just one empty line. By white line I mean a line that may be not empty, but only has whitespace characters (and newline). Is it possible to modify the above command to match this case?

Upvotes: 1

Views: 1314

Answers (2)

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19305

The B::Deparse module may be used to reveal the effective code behind a one-line program. It can be enabled in a one-liner by adding -MO=Deparse like this

perl -MO=Deparse -00 -p -e 0

The -0 option sets the value of $/: the input record separator, and setting it to empty string "" with -00 enables "paragraph mode" which means the input will be split at one or more blank lines

Another special values for -0 are -0777, which disables the record separator so that the whole file is read. And $/ may be set to \<number>, like \8192, so as to input records with a fixed length, but this is unavailable through the -0 option

If the file is not too long, read the whole file

perl -0777 -pe 's/\n\s+\n/\n\n/g'

Otherwise the file can be read in chunks of, say, 8192 bytes, but in certain cases the next chunk must be read before processing.

perl -pe 'BEGIN { $/ = \8192} $_ .= <> while /\n\s*$/ && ! eof; s/\n\s+\n/\n\n/g'

Upvotes: 1

choroba
choroba

Reputation: 241768

It's better to read the official documentation when in doubt. See -0 in perlrun and $/ in perlvar.

The text should have said

A paragraph is text delimited by two or more newlines.

"All other newlines" then become the newlines that don't come in pairs. "Ignored" means they don't separate paragraphs, but they are included in the strings read from the input.

-e0 just executes 0 as the code. 0 and 1 are exempt from warnings, any other value would work too, but with -w would warn you:

Useless use of a constant (2) in void context at -e line 1.

To achieve what you want, you can process the file in two steps: First, remove any whitespace from whitespace only lines

perl -lpe 's/^\s+$//'

(The -l is needed not to remove the newlines together with all the whitespace ).

Then run the already known

perl -00pe0

So, the whole pipeline becomes

perl -lpe 's/^\s+$//' -- file | perl -00pe0

You can, of course, do all the work in one call to perl:

perl -ne 'if (/\S/)         { $in_sep = ! print }
          elsif (! $in_sep) { $in_sep = print "\n" }' -- file

$in_sep remembers whether we are "in a separator", only the first time we enter such a whitespace block a newline is printed.

Upvotes: 6

Related Questions