KingsInnerSoul
KingsInnerSoul

Reputation: 1382

Perl WWW::Mechanize encoding issue

I have the following code:

my ($url) = "http://example.com"
my $m = WWW::Mechanize->new();
$m->get($url);
my $c = $m->content;
my $tree = HTML::TreeBuilder::XPath->new_from_content( $c );

if (my $content = $tree->look_down(_tag => "div", class => "content")) {
    $content = $content->as_text();
}

The issue is, when I parse the content, some of the text has a single or double quotes that do not get parsed correctly. For example “this” becomes “this†.

It is my understanding that this is some Windows-1252 encoding. How can I fix it?

I tried adding binmode STDOUT, ':encoding(utf-8)'; at the start of the program, it did not help.

I tried adding $content = utf8::decode($content); but it did not help.

Upvotes: 0

Views: 287

Answers (1)

cjm
cjm

Reputation: 62099

Use

$m->decoded_content;

instead of

$m->content;

Upvotes: 2

Related Questions