Emman
Emman

Reputation: 1244

How do I extract contents between two line starting and ending with same strings in perl?

The data file contains the following content

 7e a1 00 00 00 00 00 00  00 00 00 05 00 00 00
 ea 5b ee fb 7e 7e a1 01  00 00 00 00 00 00 00
 05 00 00 00 c0 9c ba e1  66 7e 7e a1 02 00 00
 00 00 00 00 00 05 00 00  00 c0 47 9f 80 1a 7e

When I try to print between linest from 1st 7e to 2nd 7e and consecutively from 3rd 7e to 4th 7e in which my expected output is as below excluding quoted '7e' in each line since start and end are same the conventional method is not working.

 `7e` a1 00 00 00 00 00 00 00 00 00 00 05 00 00 00 ea 5b ee fb `7e` 
 `7e` a1 01 00 00 00 00 00 00 00 00 05 00 00 00 c0 9c ba e1 66 `7e`
 `7e` a1 02 00 00 00 00 00 00 00 00 05 00 00 00 c0 47 9f 80 1a `7e`

I have tried with the following initial perl but results are not as expected, can anyone clarify my understanding, I have tried the following,

    use strict;
    use warnings;
    my $filename = 'input_file.txt';
    open(my $fh, '<:encoding(UTF-8)', $filename)
      or die "Could not open file '$filename' $!";
    my $count=0;
    while (<$fh>) {
      if (/7e/../7e/) {
        next if /7e/ || /7e/;
        print;
      }
    }

also need to check at each even ending of 7e ie 2nd,4th etc.. the next starting should be 7e or else should flag an error.

Upvotes: 0

Views: 85

Answers (2)

Borodin
Borodin

Reputation: 126762

A large part of your problem is that the data is split across lines. You really need to process each byte separately

This process reads in the entire dump and puts all the data into a single line in $data with a single space between the bytes. Then a simple global regex pattern finds all the subsequences that you want

use strict;
use warnings 'all';
use feature 'say';

local $/;
my $data = join ' ', split ' ', <DATA>;

say $1 while $data =~ /7e\s(.+?)\s7e/g;

__DATA__
 7e a1 00 00 00 00 00 00  00 00 00 05 00 00 00
 ea 5b ee fb 7e 7e a1 01  00 00 00 00 00 00 00
 05 00 00 00 c0 9c ba e1  66 7e 7e a1 02 00 00
 00 00 00 00 00 05 00 00  00 c0 47 9f 80 1a 7e

output

a1 00 00 00 00 00 00 00 00 00 05 00 00 00 ea 5b ee fb
a1 01 00 00 00 00 00 00 00 05 00 00 00 c0 9c ba e1 66
a1 02 00 00 00 00 00 00 00 05 00 00 00 c0 47 9f 80 1a

Upvotes: 1

TLP
TLP

Reputation: 67920

That looks like a hex dump, which would make 7e represent ~. Are you sure that parsing the hex dump is what you want to do?

The problem with your code is that your data spans over line endings, and you are reading the file in line-by-line mode. Moreover, you are skipping lines which contain 7e, which would mean that you cut content from some lines.

This would probably be simplest to solve by using the record input separator, and set it to 7e. This indicates that you are reading lines which end with the string 7e, instead of \n.

I am also using a counter to skip the odd lines. I am using Data::Dumper to display the data in a more readable way.

use strict;
use warnings;
use Data::Dumper;

$/ = '7e';
my $count;
my @data;
while (<DATA>) {
    chomp;
    if ($count++ % 2) {
        push @data, $_;
    } else {
        warn "Data in wrong place ('$_')" if /\S/;
    }
}
print Dumper \@data;

__DATA__
 7e a1 00 00 00 00 00 00  00 00 00 05 00 00 00
 ea 5b ee fb 7e 7e a1 01  00 00 00 00 00 00 00
 05 00 00 00 c0 9c ba e1  66 7e 7e a1 02 00 00
 00 00 00 00 00 05 00 00  00 c0 47 9f 80 1a 7e

Output:

$VAR1 = [
          ' a1 00 00 00 00 00 00  00 00 00 05 00 00 00
 ea 5b ee fb ',
          ' a1 01  00 00 00 00 00 00 00
 05 00 00 00 c0 9c ba e1  66 ',
          ' a1 02 00 00
 00 00 00 00 00 05 00 00  00 c0 47 9f 80 1a '
        ];

Upvotes: 2

Related Questions