Reputation: 17976
I am using XMLFormat() to encode some text for an XML document. However, when I go to read the XML file I created I get an invalid character error. Why does XMLFormat() not properly encode all characters?
I'm running CF8.
Upvotes: 3
Views: 5363
Reputation: 1927
This was a huge issue for me as well, and it turns out charset is the main factor, you need to clearly specify the correct charset.
For me I was having foreign languages inside xml, and wouldn't be parsed correctly until i put in the correct charset...
Upvotes: 0
Reputation: 19834
Unfortunately, XMLFormat
is just not an all-inclusive solution. It has a very limited list of characters that it will replace [documentation].
You'll need to do custom encoding of characters that are invalid for XML but not covered by XMLFormat
.
It's definitely not very efficient, but a potential solution would be to loop over the content of typically-suspect fields (anything user-generated, for starters) character-by-character, checking the ascii code, and if it's above 255, either omit the character or properly encode it.
Upvotes: 0
Reputation: 461
if your trying to return your XML directly to the browser, you might want to try something like for the user to download it
<cfheader name="Content-Disposition" charset="utf-8" value="attachment; filename=export.xml">
<cfcontent variable="#someXMLPacket#" type="text/xml" reset="true">
or, if you want it returned as a webpage (ala REST) then this should do the trick
<cfheader charset="utf-8">
<cfcontent variable="#someXMLPacket#" type="text/xml" reset="true">
hope that helps
Upvotes: 0
Reputation: 1978
I feel that this is a bug in XMLFormat. I am not sure who the original author of the snippet below is but here is an approach to catch the extra characters via regex...
<cfset myText = xmlFormat(myText)>
<cfscript>
i = 0;
tmp = '';
while(ReFind('[^\x00-\x7F]',myText,i,false))
{
i = ReFind('[^\x00-\x7F]',myText,i,false); // discover high chr and save it's numeric string position.
tmp = '&##x#FormatBaseN(Asc(Mid(myText,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
myText = Insert(tmp,myText,i); // insert the new hex numeric chr into the string.
myText = RemoveChars(myText,i,1); // delete the redundant high chr from string.
i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
}
return myText;
</cfscript>
Upvotes: 5
Reputation: 59
Do not forget also to put <cfprocessingdirective pageencoding="utf-8"> on top of your template.
Upvotes: 0
Reputation: 338406
Are you sure to output the file in the right encoding? You can't just do
<cffile action="write" file="foo.xml" output="#xml#" />
as the result very likely diverges from the character set your XML is in. Unless otherwise noted (by an encoding declaration), XML files are treated as UTF-8, and you should do:
<cffile action="write" file="foo.xml" output="#xml#" charset="utf-8" />
<!--- and --->
<cffile action="read" file="foo.xml" variable="xml" charset="utf-8" />
Upvotes: 5