Reputation: 12220
I try to understand the following Perl command to "remove all consecutive blank lines, leaving just one":
perl -00 -pe ''
From Perl One-Liners Explained:
First of all it does not have any code, the -e is empty. Next it has a silly -00 command line option. This command line option turns paragraph slurp mode on. A paragraph is text between two newlines. All the other newlines get ignored. The paragraph gets put in "$_" and the "-p" option prints it out.
I do not follow this explanation. Maybe the wording is not accurate.
So "A paragraph is text between two newlines."
But every line is text between two newlines.
"All the other newlines get ignored."
But there are no newlines between two successive newlines.
"The paragraph gets put in "$_" and the "-p" option prints it out."
Since it does it to the text between every two newlines, that would cram together the whole file into one long line. How does it look like what they say this command is supposed to do?
It also says that an alternative way to write it is
perl -00pe0
What does the rightmost 0
represent?
Anyway, What I actually want to achieve is to remove all consecutive white lines, leaving just one empty line. By white line I mean a line that may be not empty, but only has whitespace characters (and newline). Is it possible to modify the above command to match this case?
Upvotes: 1
Views: 1314
Reputation: 19305
The B::Deparse
module
may be used to reveal the effective code behind a one-line program.
It can be enabled in a one-liner by adding -MO=Deparse
like this
perl -MO=Deparse -00 -p -e 0
The -0
option sets the value of $/
: the input record separator, and setting it to empty string ""
with -00
enables "paragraph mode" which means the input will be split at one or more blank lines
Another special values for -0
are -0777
, which disables the record separator so that the whole file is read. And $/
may be set to \<number>
, like \8192
, so as to input records with a fixed length, but this is unavailable through the -0
option
If the file is not too long, read the whole file
perl -0777 -pe 's/\n\s+\n/\n\n/g'
Otherwise the file can be read in chunks of, say, 8192 bytes, but in certain cases the next chunk must be read before processing.
perl -pe 'BEGIN { $/ = \8192} $_ .= <> while /\n\s*$/ && ! eof; s/\n\s+\n/\n\n/g'
Upvotes: 1
Reputation: 241768
It's better to read the official documentation when in doubt. See -0
in perlrun and $/
in perlvar.
The text should have said
A paragraph is text delimited by two or more newlines.
"All other newlines" then become the newlines that don't come in pairs. "Ignored" means they don't separate paragraphs, but they are included in the strings read from the input.
-e0
just executes 0
as the code. 0 and 1 are exempt from warnings, any other value would work too, but with -w
would warn you:
Useless use of a constant (2) in void context at -e line 1.
To achieve what you want, you can process the file in two steps: First, remove any whitespace from whitespace only lines
perl -lpe 's/^\s+$//'
(The -l
is needed not to remove the newlines together with all the whitespace ).
Then run the already known
perl -00pe0
So, the whole pipeline becomes
perl -lpe 's/^\s+$//' -- file | perl -00pe0
You can, of course, do all the work in one call to perl
:
perl -ne 'if (/\S/) { $in_sep = ! print }
elsif (! $in_sep) { $in_sep = print "\n" }' -- file
$in_sep remembers whether we are "in a separator", only the first time we enter such a whitespace block a newline is printed.
Upvotes: 6