Reputation: 23866
I have the following simple Perl wrapper for a Sqlite database:
#! /usr/bin/perl
use strict;
use warnings;
use DBI;
use Data::Dumper;
my $sql = shift;
my $dbh = DBI->connect(
"dbi:SQLite:dbname=data.sqlite3",
"", # no user
"", # no pw
{
RaiseError => 1,
sqlite_unicode => 1
},
) || die $DBI::errstr;
my $sth = $dbh->prepare($sql);
$sth->execute();
print Dumper ($sth->fetchall_arrayref({}));
$sth->finish();
$dbh->disconnect();
Although I have set sqlite_unicode
flag, as explained in the documentation, I can not execute queries containing Unicode characters:
$ ./sqlite.pl "select * from person where lastname = 'Schütte'"
$VAR1 = [];
When I mask the 'ü' it seems to work, although I am not sure, if the \x{fc}
means Latin 1 FC
or Unicode U+00FC
.
$ ./sqlite.pl "select * from person where lastname like 'Sch%tte'"
$VAR1 = [
{
'id' => 8,
'firstname' => undef,
'lastname' => "Sch\x{fc}tte"
}
];
When I do the same with the Sqlite command line tool it works fine:
$ sqlite3 data.sqlite3 "select * from person where lastname = 'Schütte'"
8||Schütte
Did I forget anything to tell the DBI layer to support Unicode characters?
My local encoding is UTF-8:
$ locale
LANG=de_DE.utf8
LANGUAGE=
LC_CTYPE="de_DE.utf8"
LC_NUMERIC="de_DE.utf8"
LC_TIME="de_DE.utf8"
LC_COLLATE="de_DE.utf8"
LC_MONETARY="de_DE.utf8"
LC_MESSAGES="de_DE.utf8"
LC_PAPER="de_DE.utf8"
LC_NAME="de_DE.utf8"
LC_ADDRESS="de_DE.utf8"
LC_TELEPHONE="de_DE.utf8"
LC_MEASUREMENT="de_DE.utf8"
LC_IDENTIFICATION="de_DE.utf8"
LC_ALL=
Upvotes: 3
Views: 1186
Reputation: 385917
The first question you need to ask yourself is:
Should the query be encoded (and if so, in which encoding), or should it be decoded (i.e. Unicode Code Points)?
This should be based on sqlite_unicode
, but it's not. That only affects returned strings. I suspect that DBD::SQLite, like most/all DBDs, suffer from The Unicode Bug, which means the answer to the question is based on internal details of how the string is stored.
In a UTF-8 system, it usually "just works", and I'm surprised it doesn't here. Please try
use Encode qw( decode_utf8 );
my $sql = decode_utf8(shift);
If that doesn't work, please provide the output the following (before you shift the query out of @ARGV
):
use Devel::Peek qw( Dump );
Dump($ARGV[0]);
When I mask the 'ü' it seems to work, although I am not sure, if the
\x{fc}
means Latin 1 FC or UnicodeU+00FC
.
The question doesn't make any sense. In both iso-latin-1 and Unicode, FC
is "ü". I believe you are actually asking
Are the returned strings encoded using iso-latin-1, or are they decoded strings (i.e. Unicode Code Points)?
With sqlite_unicode
, they are the latter. You'll need to encode your output. You can do so as follows:
use open ':std', ':encoding(UTF-8)';
Upvotes: 3