Reputation: 439
I have a file that has newlines and then some line extension that I need to unwrap.
Example:
X123
+ a b c
+ d e f g
Y4567
+ a1 b2
+ c1 d2
+ e1 f2
Expected:
X123 a b c d e f g
Y4567 a1 b2 c1 d2 e1 f2
I tried : perl -00pe 's/\n\+ / /g'
But it gave a failure:
Substitution loop at -e line 1, <> chunk 1.
11.715u 18.455s 0:33.14 91.0% 0+0k 13426056+0io 155pf+0w
Upvotes: 10
Views: 702
Reputation: 71
The problem with this solution
perl -00pe 's/\n\+ / /g'
Was simply that you didnt include the input file on the command line. Create an input file like this...
$ more combine.lines.txt
X123
+ a b c
+ d e f g
Y4567
+ a1 b2
+ c1 d2
+ e1 f2
here is a test
+ this is
+ a test
here is another test
+ this is
+ another
+ test
original line with no plus lines
Then run
$ perl -00pe 's/\n\+ / /g' combine.lines.txt
X123 a b c d e f g
Y4567 a1 b2 c1 d2 e1 f2
here is a test this is a test
here is another test this is another test
original line with no plus lines
Looks to be working. However I get the feeling that you didnt write that code yourself, and dont really understand what it does. It is a slightly tricky solution that works by removing the +
and the \n
. Because of the file structure, the + lines will line up behind the non plus lines.
It may be useful to see this done manually. I have done the same thing reading the file line by line. I appended each +
line to the preceding original line. Here is the code...
#!/usr/bin/perl -w
my $combinedLine=""; #combine the original line and + lines here
while(<>){
chomp; #remove the newline
if(/^[^+]/){
print "\n$combinedLine" if($combinedLine=~/./); #when you see a non-plus line, print a
#newline and the last non-plus line,
#often referred to as a buffer
$combinedLine = $_; #start a new combined line
}else{ #plus line found
s/^\+//; #remove the plus
$combinedLine .= $_; #append to the original line
}
}
print "\n$combinedLine"; #print the last line here
Output is the same as above...
$ perl combine.lines.pl combine.lines.txt
X123 a b c d e f g
Y4567 a1 b2 c1 d2 e1 f2
here is a test this is a test
here is another test this is another test
original line with no plus lines
Upvotes: -1
Reputation: 386541
You operated on a string that is more than 231 chars in length, which is longer than the regex engine can handle. To handle strings that long, upgrade to Perl 5.22 or higher.
perl5220delta:
s///g
now works on very long strings (where there are more than 2 billion iterations) instead of dying with 'Substitution loop'. [GH #11742]. [GH #14190].
Alternatively, you could mess with the line terminator.
perl -pe'BEGIN { $/ = "\n+ " } s/\n\+ \z/ /'
Depending on how many lines start with the sequence, this risks not fixing the problem. So you could use a solution which doesn't read more than one line at a time.
perl -ne'
chomp;
print "\n" if !s/^\+ / / && $. != 1;
print;
END { print "\n"; }
'
Same idea, but shortened at the cost of readability:
perl -pe'print $l if !s/^\+ / /; $l = chop; END { print $l }'
Upvotes: 11
Reputation: 16819
Borrowing @TLP's solution, with gawk
which allows RS to be a regex (standard awk doesn't):
gawk 1 RS='\n[+]' ORS= file
As @ikegami notes, this may not do the right thing if you have input like:
X123
+ a b c
+d e f g
that should become
X123 a b c
+d e f g
Upvotes: 4
Reputation: 104082
Given your input example, here is an awk
:
awk '/^[^+]/{if (s) print s; s=$0; next}
{sub(/^\+/,""); s=s $0}
END{print s}' file
Or another awk:
awk 'sub(/^\+/,"")==0 && FNR>1 {print ""} {printf} END{print ""}' file
Or a Ruby:
ruby -ne 'chomp
puts if !$_.sub!(/^\+\s*/," ") && $. > 1
print $_ + ($<.eof? ? "\n" : "")' file
Any of those prints:
X123 a b c d e f g
Y4567 a1 b2 c1 d2 e1 f2
Upvotes: 5
Reputation: 67910
If you want a line-by-line version, you could change the input record separator to \n+
and then remove that with chomp. It would in effect just delete those characters from the file with a normal -p
one-liner. I.e.:
$ perl -pe'BEGIN{$/="\n+"}; chomp;' file.txt
The process is that it reads a "line" that ends with newline and a plus and puts that in $_
, then chomp removes that ending, and the line is printed.
Upvotes: 6