Scoox
Scoox

Reputation: 181

Perl XML Simple XMLOut encoding problem and losing newlines

I am having a bit of trouble. I am writing a script that grabs news from the European Parliament. It grabs the content from e.g.

http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//TEXT+IM-PRESS+20110401STO16789+0+DOC+XML+V0//BG

I save the content with the following code:

my $fh;
open($fh, ">","articles/".$article{"ref"}.".xml");
    XMLout($ref, OutputFile => $fh, XMLDecl=>"<?xml version='1.0' encoding='utf-8' ?>", KeyAttr=>["lang"]);
close($fh);

This works the first time I do it. However when I read the file again via XMLin, it loses its newlines and depending on the content written also some characters get defect.

This is an example script to do so:

use XML::Simple;

my $ref=XMLin("articles/20110401STO16789.xml");
open(my $fh, ">test.xml");
XMLout($ref, OutputFile => $fh, XMLDecl=>"<?xml version='1.0' encoding='utf-8' ?>", KeyAttr=>["lang"]);
close($fh);

Do you have any idea, why this problem happens?

I also uploaded the scripts as well as example script and two xml files to: http://www.stephan-muller.com/euronews.zip

Thank you in advance for your help!

Upvotes: 2

Views: 1394

Answers (1)

daxim
daxim

Reputation: 39158

Don't put content in attribute values. Put content into element content. Whitespace is significant there.

Upvotes: 2

Related Questions