cconigli
cconigli

Reputation: 41

libXML2 cannot read properly his own XML UTF-8 format

I would like to parse an UTF8 format XML with libXML2. My code is write in C and I use v2.9.3 of libXML2.

My code follow:

    xmlTextReaderPtr reader;
    xmlTextWriterPtr writer;
    writer = xmlNewTextWriterFilename("test.xml", 0);
    xmlTextWriterStartDocument(writer, NULL, "UTF-8", NULL);
    xmlTextWriterStartElement(writer, BAD_CAST "node_with_é_character");

    xmlTextWriterEndElement(writer);
    xmlTextWriterEndDocument(writer);
    xmlFreeTextWriter(writer);
    reader = xmlReaderForFile("test.xml", "UTF-8", XML_PARSE_RECOVER);

    int ret = 1;
     while (ret == 1) {
         const xmlChar *nameT = xmlTextReaderConstName(reader);

         printf("\n   ---> %s\n",nameT);
         ret = xmlTextReaderRead(reader);
    }

Output is :

   ---> (null)

   ---> node_with_é_character

Problem is "node_with_é_character" trace and not "node_with_é_character"

My command prompt is "chcp 1252" set.

I don't understand why liXML2 cannot store/read the "é" character.

Upvotes: 0

Views: 646

Answers (1)

Yann Droneaud
Yann Droneaud

Reputation: 5453

As noted in comment your under Windows, so I guess it's likely your source code is not UTF-8 encoded, so the C string "node_with_é_character" is not UTF-8 encoded in your executable.

I don't know libxml2 interfaces, but code example is quite clear it expects input parameters in UTF-8. See http://xmlsoft.org/examples/testWriter.c

/* Write a comment as child of EXAMPLE.
 * Please observe, that the input to the xmlTextWriter functions
 * HAS to be in UTF-8, even if the output XML is encoded
 * in iso-8859-1 */
tmp = ConvertInput("This is a comment with special chars: <\xE4\xF6\xFC>",
                   MY_ENCODING);

Save your source file as UTF-8 will help you fix your issue.

Upvotes: 1

Related Questions