tim
tim

Reputation: 113

Perl DBI/Mysql Unicode Bug

I'm not sure if it's a bug or I'm doing something wrong:

I read data per

open my $fh, "<:encoding(iso-latin1)", $file or die "Failed to open $file: $!";

$file is definitely in iso-latin1.

Then I have a mysql table which is

ENGINE=InnoDB AUTO_INCREMENT=53072 DEFAULT CHARSET=latin1

I check the connection settings:

$dbh->prepare("show variables");

Which gives

character_set_client, latin1
character_set_connection, latin1
character_set_database, latin1
character_set_filesystem, binary
character_set_results, latin1
character_set_server, latin1
character_set_system, utf8

So to me everything should be fine:

But: Data in table is plain utf8 (most probably perl's internal format in this case).

Did I miss something is this maybe a bug in DBI/DBD::mysql?

Upvotes: 1

Views: 250

Answers (1)

Dave Cross
Dave Cross

Reputation: 69264

My guess would be that you're right and this data is in Perl's internal character format. The sequence goes like this.

  • Data in input file stored as Latin-1 bytes
  • Data read from input file and auto-converted to Perl characters because of the encoding option on your open statement
  • Data sent to MySQL as Perl characters
  • MySQL slightly confused by getting UTF8 instead of Latin-1, but stores it anyway as best it can

The step your missing is to encode you Perl characters back into Latin-1 before sending them to the database. The obvious solution is to call encode('iso-885901', $string) on every value you sent to the database. It would be nice if there was some kind auto-encode option. But I can't find one.

Of course, if your data is all going to be Latin-1, then you could consider just ignoring any decoding/encoding issues. It should all just work without that complication.

Upvotes: 1

Related Questions