perl get multiple lines from text file between pattern

Question

I have a html file that contains data which I have to push to MySql database. I try to parse html file to get values I need in scalars which I got right but I have a problem when I get to the point that I need to collect data not from a single line of text but multiple lines between certain patter. Here is what I got so far that kinda works:

  #!/usr/bin/perl
  binmode STDOUT,':encoding(cp1250)';

  open FILE, "index.html" or die "Could not open $file: $!";
  my $word;
  my $description;
  my $origin;

  while (my $line = )
  { 
    if ($line =~ m/(?<=)(.*)(?=)/)
    {
    $word = $line =~ m/<=(.*)/;
    $word = $1;     
    }

    if ($line =~ m/(?<=)/)
    {
    print $line;
    $origin = $line =~ m/
 (.*)/;
    $origin = $1;       
    }


  }

print "$word 
";
print "$origin";

Now I want to grab a few lines of a text - does not have to be in a single scalar but I dont know how many lines there will be. All I know is that the lines are in between of:



text I want
1.text I want
2.text I want

Plus I would like to get rid of

's

I thought of reading a line, storing it in a scaral, reading another line and comparing to the recently saved scalar. But how I supouse to check if I have all I want in that scalar?

hwnd · Accepted Answer

Use a tool for the job instead of a regular expression.

use strict;
use warnings;
use feature 'say';
use HTML::TreeBuilder;

my $tr = HTML::TreeBuilder->new_from_file('index.html');

for my $div ($tr->look_down(_tag => 'div', 'class' => 'post-content')) {
  for my $t ($div->look_down(_tag => 'p')) {
    say $t->as_text;
  }
}

Output

text I want 1.text I want 2.text I want

perl get multiple lines from text file between pattern

Answers (2)

Related Questions