giordano
giordano

Reputation: 3152

Import of xml with XPath and encoding

I would like to import data from an xml with german umlauts (ä,ö,ü). To analyze the problem I imported the data as sql-insert, csv-file and xml-file. To see the source of the imports the entries are marked with prefix insert,csv and xml. Here is the result:

The umlauts from the xml source are not correctly imported.
Here are the codes to produce this:

Table definition

DROP TABLE IF EXISTS test01.animal;
CREATE TABLE test01.animal (
name VARCHAR(50) DEFAULT NULL
, category VARCHAR(50) DEFAULT NULL
) ENGINE = InnoDB 
DEFAULT CHARACTER SET = utf8 COLLATE = utf8_unicode_ci
;

INSERT

INSERT INTO test01.animal (name, category)
VALUES
('insert_snäke','reptile')
,('insert_frög','amphibian')
,('insert_tüna','fish')
,('insert_racoon','mammal')
;

CSV
data

csv_snäke,reptile
csv_frög,amphibian
csv_tüna,fish
csv_racoon,mammal

sql

LOAD DATA INFILE 'C:/animals.csv'
INTO TABLE test01.animal
FIELDS TERMINATED BY ','
;

XML
data

<?xml version="1.0" encoding="UTF-8"?>
<database>
 <select query="SELECT name, category FROM animal">
  <row>
   <name>xml_snäke</name>
   <category>reptile</category>
  </row>
  <row>
   <name>xml_frög</name>
   <category>amphibian</category>
  </row>
  <row>
   <name>xml_tüna</name>
   <category>fish</category>
  </row>
  <row>
   <name>xml_racoon</name>
   <category>mammal</category>
  </row>
 </select>
</database>

Perl/XPath

use strict;
use DBI;
use XML::XPath;
use XML::XPath::XMLParser;

my $dir0 = "C:";

my $dbh = DBI->connect ("DBI:mysql:test01",
                        "root", "fire2013",
                        { RaiseError => 1, PrintError => 0});
my $xp = XML::XPath->new (filename => "$dir0/animals.xml");
my $nodelist = $xp->find ("//row");
foreach my $row ($nodelist->get_nodelist ())
    {
        $dbh->do (
            "INSERT INTO animal (name, category) VALUES (?,?)",
            undef,
            $row->find ("name")->string_value (),
            $row->find ("category")->string_value ()
            );
    }
$dbh->disconnect ();

Any idea why I don't get umlauts by importing xml? Any help appreciated. Thanks.
PS: Windows 7 / MariaDB 5.5.31 / Strawberry Perl 5.16

Upvotes: 0

Views: 165

Answers (1)

gangabass
gangabass

Reputation: 10666

Try this:

$dbh->do(qq{SET NAMES 'utf8'}) or die $dbh->errstr;

Upvotes: 1

Related Questions