Reputation: 15
I tried to change my Perl / HTML files to the UTF-8 format. Unfortunately I have a problem with forms. I created a little test script which exemplifies the problem. All it does is reload itself, so that the text entered will be shown again. It works fine with ASCII characters. As soon as I enter German "Umlaute" (ÄÖÜ) the characters get distorted. It cannot handle russian characters (ЭЯЮ) as well. Here is the script:
#!/usr/bin/perl
use utf8;
use Encode;
use open ':std', ':encoding(UTF-8)';
# Safe query-string in hash:
$querystring = $ENV{ 'QUERY_STRING' };
read(STDIN, $poststring, $ENV{CONTENT_LENGTH});
if (($querystring ne "") && ($poststring ne "")) { $querystring .= "&$poststring"; }
else { $querystring .= $poststring; }
$querystring =~ s/&/=/gi;
%query = split( /=/, $querystring );
foreach $key ( keys( %query ) ) {
$query{$key} =~ tr/+/ /;
$query{$key} =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$uquer{$key} = decode_utf8( $query{$key} );
}
print "Content-Type: text/html; charset=\"UTF-8\"\n\n";
print <<END;
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
<FORM NAME="frmeing" METHOD="POST" ACTION="test0.cgi">
<INPUT NAME="df_kurs" TYPE="TEXT" VALUE="$uquer{'df_kurs'}">
<INPUT TYPE="SUBMIT">
</FORM>
</BODY>
</HTML>
END
You can test this script as well. It is online at this address: http://project-website.org/test/test0.cgi Does anybody know what could be the problem? Thank you in advance for your help!
Upvotes: 1
Views: 1349
Reputation: 385764
It's due to a bug in your version of decode_utf8
.
$ perl -Mutf8 -MEncode -E'
$u = $d = encode_utf8("é");
utf8::upgrade($u); # Changes how the string is stored internally
say $u eq $d ?1:0;
say decode_utf8($d) eq decode_utf8($u) ?1:0;
'
1
0
As you can see, $u
and $d
are equal, but your version of decode_utf8
decodes them differently. Specifically, it returns $u
unchanged.
This has been fixed in newer versions of Encode. (2.53, I think.)
The easier way to address the problem is to fix your own bug. Using use open
, you tell your program to decode STDIN from UTF-8 before unescaping the url-encoding and decoding from UTF-8 a second time.
Fix:
#!/usr/bin/perl
use utf8; # Source code is encoded using UTF-8.
use open ':encoding(UTF-8)'; # Set default encoding for file handles.
BEGIN { binmode(STDOUT, ':encoding(UTF-8)'); } # HTML
BEGIN { binmode(STDERR, ':encoding(UTF-8)'); } # Error log
use Encode;
# Safe query-string in hash:
$querystring = $ENV{ 'QUERY_STRING' };
read(STDIN, my $poststring, $ENV{CONTENT_LENGTH});
if (($querystring ne "") && ($poststring ne "")) { $querystring .= "&$poststring"; }
else { $querystring .= $poststring; }
$querystring =~ s/&/=/gi;
%query = split( /=/, $querystring );
foreach $key ( keys( %query ) ) {
$query{$key} =~ tr/+/ /;
$query{$key} =~ s/%([\da-f][\da-f])/chr( hex($1) )/egi;
$uquer{$key} = decode_utf8( $query{$key} );
}
print "Content-Type: text/html; charset=\"UTF-8\"\n\n";
print <<END;
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
<FORM NAME="frmeing" METHOD="POST">
<INPUT NAME="df_kurs" TYPE="TEXT" VALUE="$uquer{'df_kurs'}">
<INPUT TYPE="SUBMIT">
</FORM>
</BODY>
</HTML>
END
But you really should use CGI.pm.
#!/usr/bin/perl
use strict; # Always!
use warnings; # Always!
use utf8; # Source code is encoded using UTF-8.
use open ':encoding(UTF-8)'; # Set default encoding for file handles.
BEGIN { binmode(STDOUT, ':encoding(UTF-8)'); } # HTML
BEGIN { binmode(STDERR, ':encoding(UTF-8)'); } # Error log
use CGI qw( -utf8 );
use Encode;
my $cgi = CGI->new();
my %uquer = $cgi->Vars();
print $cgi->header('text/html; charset=UTF-8');
print <<END;
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
<FORM NAME="frmeing" METHOD="POST">
<INPUT NAME="df_kurs" TYPE="TEXT" VALUE="$uquer{'df_kurs'}">
<INPUT TYPE="SUBMIT">
</FORM>
</BODY>
</HTML>
END
Upvotes: 6