Perl - push lines inbetween regex into one element of array

Question

This is the log file I am dealing with -

|
blah1a
blah1b
blah1c
|
****blahnothing1
|
blah2a
blah2b
blah2c
|
blahnothing2
|
blah3a
blah3b
blah3c
|
blahnothing3

The information that I need is nestled between two pipe characters. There are alot of lines with that start with asteriks, I skip over them. Each line has windows end of line characters. The data in between these pipe characters is contigious, but when read on a linux host, it is chopped up with the windows new lines. I wrote the perl script with a range operator between the two lines hoping that everything that starts with a pipe delimiter would get pushed into an array element and then stop at the next pipe delimiter, then start again. Each array element would have all the lines in between the two pipes characters.

Ideally the arrays would look like this, sans the windows control characters.

$lines[0] blah1a blah1b blah1c
$lines[1] blah2a blah2b blah2c
$lines[2] blah3a blah3b blah3c

However each arrays do not look like that.

#!/usr/bin/perl

use strict ;
use warnings ;

my $delimiter = "|";
my $filename = $ARGV[0] ;
my @lines ;
open(my $fh, '<:encoding(UTF-8)' , $filename) or die "could not open file $filename $!";

while (my $line = readline $fh) {
    next if ($line =~/^\*+/) ;
    if ($line =~ /$delimiter/ ... $line =~/$delimiter/) {
    push (@lines, $line) ;
    }


}

print  $lines[0] ;
print  $lines[1] ;
print  $lines[2] ;

Borodin · Accepted Answer

This seems to satisfy your requirement

I've left the two lines blahnothing2 and blahnothing3 in place as I couldn't see a rationale for removing them

The \R regex pattern is the generic newline, and matches the newline sequences from any platform, i.e. CR, LF, or CRLF

use strict;
use warnings 'all';

my $data = do {
    open my $fh, '<:raw', 'blah.txt' or die $!;
    local $/;
    <$fh>;
};

$data =~ s/^\s*\*.*\R/ /gm; # Remove lines starting with *
$data =~ s/\R/ /g;          # Change all line endings to spaces

# Split on pipe and remove blank elements
my @data = grep /\S/, split /\s*\|\s*/, $data; 

use Data::Dump;
dd \@data;

output

[
  "blah1a blah1b blah1c",
  "blah2a blah2b blah2c",
  "blahnothing2",
  "blah3a blah3b blah3c",
  "blahnothing3 ",
]

Perl - push lines inbetween regex into one element of array

Answers (2)

output

Related Questions