Reputation: 655
I have a legacy web application using Perl, CGI, and MySQL 5.5.62. In fields completed by customers, I need to support the umlaut and emoji characters that they often use in their input.
In an attempt to learn, I set up the following stand-alone test. (It's deliberately very simple and lacking basic security checks on input.)
Dump of database widget
:
DROP TABLE IF EXISTS `experiment`;
CREATE TABLE `experiment` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(40) CHARACTER SET utf8mb4 DEFAULT NULL,
`content` text CHARACTER SET utf8mb4,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
LOCK TABLES `experiment` WRITE;
INSERT INTO `experiment` VALUES (1,'Record','Now is the time for all good men to come to the aid of their country. 😀\r\nThe quick brown fox jumped over the lazy dög.');
UNLOCK TABLES;
Perl code:
#!/usr/bin/perl -T
use strict;
use warnings;
use DBI;
use CGI '-utf8';
my $dbh = DBI->connect('DBI:mysql:widget','test','test', { mysql_enable_utf8 => 0,}) or die "Can't connect to the database: $DBI::errstr";
my $sth = $dbh->prepare('SELECT * FROM `experiment`') or die "Couldn't prepare statement: " . $dbh->errstr;
$sth->execute or die "Can't execute SQL statement: $DBI::errstr";
my $hashref = $sth->fetchrow_hashref or die "Can't fetchrow_hashref: $DBI::errstr\n";
$sth->finish;
my $search = '';
for my $i (qw(fox dog)) {
$sth = $dbh->prepare("SELECT * FROM `experiment` WHERE `content` LIKE '%$i%'") or die "Couldn't prepare statement: " . $dbh->errstr;
my $count = $sth->execute or die "Can't execute SQL statement: $DBI::errstr";
$search .= "<h6>String: [$i] found [$count]</h6>";
}
$sth->finish;
my $action = CGI::param('action') || '';
if ($action eq 'save') {
my $new = CGI::param('value') || '';
$sth = $dbh->prepare("UPDATE `experiment` SET `content` = '$new' WHERE `id` = 1") or die "Couldn't prepare statement: " . $dbh->errstr;
$sth->execute or die "Can't execute SQL statement: $DBI::errstr";
$sth->finish;
print "Location: http://simulated-domain-name.com/cgi-bin/test.cgi\n\n";
exit;
}
$dbh->disconnect;
print <<EOF;
Content-type: text/html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> <!-- not part of the experiment, just make it look nice -->
</head>
<body>
<div class="container my-3">
<h5>Content = $hashref->{content}</h5>
$search
<form method="post">
<input type="hidden" name="action" value="save">
<div class="form-group">
<label class="font-weight-bold" for="exampleFormControlTextarea1">Content</label>
<textarea name="value" class="form-control" id="exampleFormControlTextarea1" rows="3">$hashref->{content}</textarea>
</div>
<button type="submit" class="btn btn-primary">Submit</button>
</form>
</div>
</body>
</html>
EOF
exit;
As far as I can tell, the '-utf8'
on the line with use CGI
has no effect.
When using mysql_enable_utf8 = 0
the program works beautifully but accent insensitive search fails. Umlauts and emoji are properly displayed in the browser.
Output:
Content = Now is the time for all good men to come to the aid of their country. 😀 The quick brown fox jumped over the lazy dög. String: [fox] found [1] String: [dog] found [0E0]
When using `mysql_enable_utf8 = 1' the HTML output looks scrambled.
Output:
Content = Now is the time for all good men to come to the aid of their country. 😀 The quick brown fox jumped over the lazy dög. String: [fox] found [1] String: [dog] found [0E0]
I feel like I'm close, but missing something important.
Upvotes: 1
Views: 509
Reputation: 655
This is the solution that worked for me on CentOS 7:
mysql_enable_utf8mb4
option in DBI->connect
binmode(STDOUT, ":encoding(UTF-8)");
to top of Perl scriptOutput is now as desired:
Content = Now is the time for all good men to come to the aid of their country. 🤣 The quick brown fox jumped over the lazy dög. String: [fox] found [1] String: [dog] found [1]
Upvotes: 0
Reputation: 142503
Emoji --> CHARACTER SET utf8mb4
.
Accent insensitive --> any COLLATION utf8mb4_..._ci
.
Since you are on the relatively antique 5.5, you may run into the "767 problem". See http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes
If you have Question marks or Mojibake (such as dög
for dög
), see Trouble with UTF-8 characters; what I see is not what I stored
My notes on Perl:
use utf8;
use open ':std', ':encoding(UTF-8)';
my $dbh = DBI->connect("dbi:mysql:".$dsn, $user, $password, {
PrintError => 0,
RaiseError => 1,
mysql_enable_utf8 => 1, # Switch to UTF-8 for communication and decode.
});
# or {mysql_enable_utf8mb4 => 1} if using utf8mb4
(I don't have any notes on use CGI
.)
Upvotes: 4