Reputation: 81
I'm trying to post content through a website with WWW:Mechanize
.
My content seems to be UTF-8 and the website where I post it is a page that specifies ISO-8859-15 encoding on the head of the HTML page.
The post works but I get this result
Example of the encoding I have (in French) :
acteur majeur de l?assurance et
référence en gestion
patrimoniale, propose une approche globale pour
une clientèle aisée et haut de gamme.
Here is my code
use WWW::Mechanize;
use Encode;
use open qw(:std :utf8);
my $mech = WWW::Mechanize->new(
stack_depth => 0,
timeout => 10,
);
mech->get($urlContentOtherWebsite);
my $tree = HTML::TreeBuilder::XPath->new_from_content($mech->content);
my $content = $tree->findvalue('/html/body//div[@id="content"]');
$tree->delete;
mech->get($urlFormMyWebsite);
$mech->form_name("formular"); # Form Post Emploi
$mech->set_fields(
content => $content
);
$mech->submit;
have you some idea or clue to resolve my problem please?
Upvotes: 1
Views: 1152
Reputation: 123320
From studying the code:
HTML::Form, which is used inside WWW::Mechanize, uses the accept-charset
parameter of the <form...>
tag to find out which encoding to use. If there is no such parameter than it uses a default charset, which is UTF-8. You can set the acceptable charset with $form->accept_charset('iso-8859-1')
, e.g. the following should work if I read the code correctly:
$mech->form_name("formular")->accept_charset('iso-8859-1');
$mech->set_fields(...);
$mech->submit;
Upvotes: 3
Reputation: 126722
You need to add
binmode STDOUT, ':encoding(utf-8)';
at the start of your program to declare that STDOUT
is expecting UTF-8 characters, otherwise you will see the individual bytes instead of the proper characters
You also need to decode the input as UTF-8 using
use Encode;
followed by
decode('UTF-8', $_)
where the incoming text is in $_
.
Here's an example
use utf8;
use strict;
use warnings;
use Encode;
binmode STDOUT, ':encoding(utf-8)';
print decode('UTF-8', $_) for <DATA>;
__DATA__
acteur majeur de l?assurance et
référence en gestion
patrimoniale, propose une approche globale pour
une clientèle aisée et haut de gamme.
output
acteur majeur de l?assurance et
référence en gestion
patrimoniale, propose une approche globale pour
une clientèle aisée et haut de gamme.
I don't quite understand l?assurance
, but I imagine that the data has been altered somewhere between the original web site and the Stack Overflow post. As you can see, the rest of the text is correct
Upvotes: 1