Reputation: 2635
This is the log file I am dealing with -
|
blah1a
blah1b
blah1c
|
****blahnothing1
|
blah2a
blah2b
blah2c
|
blahnothing2
|
blah3a
blah3b
blah3c
|
blahnothing3
The information that I need is nestled between two pipe characters. There are alot of lines with that start with asteriks, I skip over them. Each line has windows end of line characters. The data in between these pipe characters is contigious, but when read on a linux host, it is chopped up with the windows new lines. I wrote the perl script with a range operator between the two lines hoping that everything that starts with a pipe delimiter would get pushed into an array element and then stop at the next pipe delimiter, then start again. Each array element would have all the lines in between the two pipes characters.
Ideally the arrays would look like this, sans the windows control characters.
$lines[0] blah1a blah1b blah1c
$lines[1] blah2a blah2b blah2c
$lines[2] blah3a blah3b blah3c
However each arrays do not look like that.
#!/usr/bin/perl
use strict ;
use warnings ;
my $delimiter = "|";
my $filename = $ARGV[0] ;
my @lines ;
open(my $fh, '<:encoding(UTF-8)' , $filename) or die "could not open file $filename $!";
while (my $line = readline $fh) {
next if ($line =~/^\*+/) ;
if ($line =~ /$delimiter/ ... $line =~/$delimiter/) {
push (@lines, $line) ;
}
}
print $lines[0] ;
print $lines[1] ;
print $lines[2] ;
Upvotes: 0
Views: 908
Reputation: 66899
It seems that you want to merge lines between |
, into a string, which gets placed on an array.
One way is to set the |
as input record separator, so read a chunk between pipes each time
{ # localize the change to $/
local $/ = "|";
open(my $fh, '<:encoding(UTF-8)' , $filename)
or die "could not open file $filename $!";
my @records;
while (my $section = <$fh>)
{
next if $section =~ /^\s*\*/;
chomp $section; # remove the record separator (| here)
$section =~ s/\R/ /g; # clean up newlines
$section =~ s/^\s*//; # clean up leading spaces
push @records, $section if $section;
}
print "$_\n" for @records;
}
I skip a "section" if it starts with *
(and an optional space). There can be more restrictive versions. The $section
can end up being an emtpy string, so we push
it on the array conditionally.
Output, with the example in the question copy-pasted into the input file with $filename
blah1a blah1b blah1c blah2a blah2b blah2c blahnothing2 blah3a blah3b blah3c blahnothing3
The approach in the question is fine, but you need to merge lines within a "section" (between pipes) and place each such string on the array. So you need a flag to track when enter/leave a section.
Upvotes: 1
Reputation: 126742
This seems to satisfy your requirement
I've left the two lines blahnothing2
and blahnothing3
in place as I couldn't see a rationale for removing them
The \R
regex pattern is the generic newline, and matches the newline sequences from any platform, i.e. CR, LF, or CRLF
use strict;
use warnings 'all';
my $data = do {
open my $fh, '<:raw', 'blah.txt' or die $!;
local $/;
<$fh>;
};
$data =~ s/^\s*\*.*\R/ /gm; # Remove lines starting with *
$data =~ s/\R/ /g; # Change all line endings to spaces
# Split on pipe and remove blank elements
my @data = grep /\S/, split /\s*\|\s*/, $data;
use Data::Dump;
dd \@data;
[
"blah1a blah1b blah1c",
"blah2a blah2b blah2c",
"blahnothing2",
"blah3a blah3b blah3c",
"blahnothing3 ",
]
Upvotes: 2