Jasmine Lognnes
Jasmine Lognnes

Reputation: 7097

How to convert XML characters to utf8?

When I dump xml using Simple::XML I end up with strings that contain escaped characters such as \x{e6}. Here is an example

#!/usr/bin/perl
use Data::Dumper;
use Encode;

$s="sel\x{e6}re";
decode_utf8($s);
print Dumper $s;

outputs

$VAR1 = 'sel�re';

Question

How can I get the weird character into UTF-8?

Update

Here is the full xml output. http://pastebin.com/Sitm01kh

Update 2

As pointed out in the comments, the XML is fine, but the problem comes when I

my $ref = XMLin($xml, ForceArray => 1, KeyAttr => { Element => 'Id' });
print Dumper $ref;

http://pastebin.com/7KDB50fd

Upvotes: 1

Views: 425

Answers (2)

sotona
sotona

Reputation: 2016

#!/usr/bin/perl

use DDP;
use XML::Simple;

my $xml = '<Element Id="496669" ParentId="495555" Name="Klasselærere" ContextName="01005 Advanced Engineering Mathematics 1 E15/Klasselærere" IsArchived="false" SubgroupCount="0" />';

my $result = XMLin($xml);

binmode(STDOUT, ":utf8");
print p($result)

produces the following output

{
   ContextName     "01005 Advanced Engineering Mathematics 1 E15/Klasselærere",
   Id              496669,
   IsArchived      "false",
   Name            "Klasselærere",
   ParentId        495555,
   SubgroupCount   0
   }

Data::Dumper itself works weirdly with unicode. Use Data::Printer to see unicode characters

Upvotes: 1

jhoran
jhoran

Reputation: 340

I guess that your terminal is not able to display the caracter \xe6.

If you are on linux, type 'locale' to see what are the settings of your terminal.

You can try to set the terminal encoding like that :

export LC_ALL=utf-8

Upvotes: 1

Related Questions