Jason
Jason

Reputation: 17976

Coldfusion XMLFormat() not converting all characters

I am using XMLFormat() to encode some text for an XML document. However, when I go to read the XML file I created I get an invalid character error. Why does XMLFormat() not properly encode all characters?

I'm running CF8.

Upvotes: 3

Views: 5363

Answers (6)

crosenblum
crosenblum

Reputation: 1927

This was a huge issue for me as well, and it turns out charset is the main factor, you need to clearly specify the correct charset.

For me I was having foreign languages inside xml, and wouldn't be parsed correctly until i put in the correct charset...

Upvotes: 0

Adam Tuttle
Adam Tuttle

Reputation: 19834

Unfortunately, XMLFormat is just not an all-inclusive solution. It has a very limited list of characters that it will replace [documentation].

You'll need to do custom encoding of characters that are invalid for XML but not covered by XMLFormat.

It's definitely not very efficient, but a potential solution would be to loop over the content of typically-suspect fields (anything user-generated, for starters) character-by-character, checking the ascii code, and if it's above 255, either omit the character or properly encode it.

Upvotes: 0

LucasS
LucasS

Reputation: 461

if your trying to return your XML directly to the browser, you might want to try something like for the user to download it

<cfheader name="Content-Disposition" charset="utf-8" value="attachment; filename=export.xml">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

or, if you want it returned as a webpage (ala REST) then this should do the trick

<cfheader charset="utf-8">
<cfcontent variable="#someXMLPacket#" type="text/xml"  reset="true">

hope that helps

Upvotes: 0

kevink
kevink

Reputation: 1978

I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...

  <cfset myText = xmlFormat(myText)>

  <cfscript>
      i = 0;
      tmp = '';
      while(ReFind('[^\x00-\x7F]',myText,i,false))
      {
        i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
        tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
        myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
        myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
        i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
      }
      return myText;
  </cfscript>

Upvotes: 5

rparente
rparente

Reputation: 59

Do not forget also to put <cfprocessingdirective pageencoding="utf-8"> on top of your template.

Upvotes: 0

Tomalak
Tomalak

Reputation: 338406

Are you sure to output the file in the right encoding? You can't just do

<cffile action="write" file="foo.xml" output="#xml#" />

as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:

<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />

Upvotes: 5

Related Questions