Getting html content between specific
tag only

Question

I wrote the following code to scrape text content between

and the next

tag, but it only prints out the whole HTML source.

use LWP::Simple;

$url = 'http://domain.com/?xxxxxxx';

my $content = get($url);

$data =~ m/(.*?)<\/div>/g;

if (is_success(getprint($url))) {
    print $_;
 }

# or using the following line directly without if statement
print $data;

The HTML piece that I'm interested in looks like this:


text text text text text text text text text
text text text

That specific div tag id appears only once in the whole HTML document.

I'm also looking to strip out

tags or tidy the output by line breaks for storing as a text file later or reusing.

After reading your valuable comments I tried using WWW::Mechanize and WWW::Mechanize::TreeBuilder instead, like this

use strict;
use warnings;

use WWW::Mechanize; 
use WWW::Mechanize::TreeBuilder; 

my $mech = WWW::Mechanize->new; 
WWW::Mechanize::TreeBuilder->meta->apply($mech); 

$mech->get( 'domain.com/?xxxxxx' ); 

my @list = $mech->find('div id="aaa-bbb"'); # or  or ""
foreach (@list) { 
  print $_->as_text(); 
}

It works for simple tags but can't get it to work with

. It just exits the script without printing anything. I used double and single quotes, it already has double quotes inside the tag id.

Getting html content between specific <div> tag only

Answers (1)

Related Questions

Getting html content between specific &lt;div&gt; tag only

Answers (1)

Related Questions

Getting html content between specific <div> tag only