Reputation: 41
I would like to parse an UTF8 format XML with libXML2. My code is write in C and I use v2.9.3 of libXML2.
My code follow:
xmlTextReaderPtr reader;
xmlTextWriterPtr writer;
writer = xmlNewTextWriterFilename("test.xml", 0);
xmlTextWriterStartDocument(writer, NULL, "UTF-8", NULL);
xmlTextWriterStartElement(writer, BAD_CAST "node_with_é_character");
xmlTextWriterEndElement(writer);
xmlTextWriterEndDocument(writer);
xmlFreeTextWriter(writer);
reader = xmlReaderForFile("test.xml", "UTF-8", XML_PARSE_RECOVER);
int ret = 1;
while (ret == 1) {
const xmlChar *nameT = xmlTextReaderConstName(reader);
printf("\n ---> %s\n",nameT);
ret = xmlTextReaderRead(reader);
}
Output is :
---> (null)
---> node_with_é_character
Problem is "node_with_é_character" trace and not "node_with_é_character"
My command prompt is "chcp 1252" set.
I don't understand why liXML2 cannot store/read the "é" character.
Upvotes: 0
Views: 646
Reputation: 5453
As noted in comment your under Windows, so I guess it's likely your source code is not UTF-8 encoded, so the C string "node_with_é_character" is not UTF-8 encoded in your executable.
I don't know libxml2 interfaces, but code example is quite clear it expects input parameters in UTF-8. See http://xmlsoft.org/examples/testWriter.c
/* Write a comment as child of EXAMPLE.
* Please observe, that the input to the xmlTextWriter functions
* HAS to be in UTF-8, even if the output XML is encoded
* in iso-8859-1 */
tmp = ConvertInput("This is a comment with special chars: <\xE4\xF6\xFC>",
MY_ENCODING);
Save your source file as UTF-8 will help you fix your issue.
Upvotes: 1