neuviemeporte
neuviemeporte

Reputation: 6478

Why can't I parse a XML file using QXmlStreamReader from Qt?

I'm trying to figure out how QXmlStreamReader works for a C++ application I'm writing. The XML file I want to parse is a large dictionary with a convoluted structure and plenty of Unicode characters so I decided to try a small test case with a simpler document. Unfortunately, I hit a wall. Here's the example xml file:

<?xml version="1.0" encoding="UTF-8" ?>
<persons>
    <person>
        <firstname>John</firstname>
        <surname>Doe</surname>
        <email>[email protected]</email>
        <website>http://en.wikipedia.org/wiki/John_Doe</website>
    </person>
    <person>
        <firstname>Jane</firstname>
        <surname>Doe</surname>
        <email>[email protected]</email>
        <website>http://en.wikipedia.org/wiki/John_Doe</website>
    </person>
    <person>
        <firstname>Matti</firstname>
        <surname>Meikäläinen</surname>
        <email>[email protected]</email>
        <website>http://fi.wikipedia.org/wiki/Matti_Meikäläinen</website>
    </person>
</persons>

...and I'm trying to parse it using this code:

int main(int argc, char *argv[])
{
    if (argc != 2) return 1;

    QString filename(argv[1]);
    QTextStream cout(stdout);
    cout << "Starting... filename: " << filename << endl;

    QFile file(filename);
    bool open = file.open(QIODevice::ReadOnly | QIODevice::Text);
    if (!open) 
    {
        cout << "Couldn't open file" << endl;
        return 1;
    }
    else 
    {
        cout << "File opened OK" << endl;
    }

    QXmlStreamReader xml(&file);
    cout << "Encoding: " << xml.documentEncoding().toString() << endl;

    while (!xml.atEnd() && !xml.hasError()) 
    {
        xml.readNext();
        if (xml.isStartElement())
        {
            cout << "element name: '" << xml.name().toString() << "'" 
                << ", text: '" << xml.text().toString() << "'" << endl;
        }
        else if (xml.hasError())
        {
            cout << "XML error: " << xml.errorString() << endl;
        }
        else if (xml.atEnd())
        {
            cout << "Reached end, done" << endl;
        }
    }

    return 0;
}

...then I get this output:

C:\xmltest\Debug>xmltest.exe example.xml
Starting... filename: example.xml
File opened OK
Encoding:
XML error: Encountered incorrectly encoded content.

What happened? This file couldn't be simpler and it looks consistent to me. With my original file I also get a blank entry for the encoding, the entries' names() are displayed, but alas, the text() is also empty. Any suggestions greatly appreciated, personally I'm thorougly mystified.

Upvotes: 11

Views: 23274

Answers (5)

Muhammad Suleman
Muhammad Suleman

Reputation: 2922

Try this Example i just copied it from my project it work for me.

void MainWindow::readXML(const QString &fileName)
{


fileName = "D:/read.xml";

QFile* file = new QFile(fileName);
if (!file->open(QIODevice::ReadOnly | QIODevice::Text))
{
     QMessageBox::critical(this, "QXSRExample::ReadXMLFile", "Couldn't open xml file", QMessageBox::Ok);
     return;
}

/* QXmlStreamReader takes any QIODevice. */
QXmlStreamReader xml(file);
/* We'll parse the XML until we reach end of it.*/
while(!xml.atEnd() && !xml.hasError())
{
    /* Read next element.*/
    QXmlStreamReader::TokenType token = xml.readNext();
    /* If token is just StartDocument, we'll go to next.*/
    if(token == QXmlStreamReader::StartDocument)
        continue;

    /* If token is StartElement, we'll see if we can read it.*/
    if(token == QXmlStreamReader::StartElement) {
        if(xml.name() == "email") {
            ui->listWidget->addItem("Element: "+xml.name().toString());
            continue;
        }
    }
}
/* Error handling. */
if(xml.hasError())
    QMessageBox::critical(this, "QXSRExample::parseXML", xml.errorString(), QMessageBox::Ok);

//resets its internal state to the initial state.
xml.clear();
}

void MainWindow::writeXML(const QString &fileName)
{
fileName = "D:/write.xml";
QFile file(fileName);
if (!file.open(QIODevice::WriteOnly | QIODevice::Text))
{
     QMessageBox::critical(this, "QXSRExample::WriteXMLFile", "Couldn't open anna.xml", QMessageBox::Ok);
     return;
}
QXmlStreamWriter xmlWriter(&file);
xmlWriter.setAutoFormatting(true);
xmlWriter.writeStartDocument();
//add Elements
xmlWriter.writeStartElement("bookindex");
ui->listWidget->addItem("bookindex");
xmlWriter.writeStartElement("Suleman");
ui->listWidget->addItem("Suleman");

//write all elements in xml filexl
xmlWriter.writeEndDocument();
file.close();
if (file.error())
    QMessageBox::critical(this, "QXSRExample::parseXML", file.errorString(), QMessageBox::Ok);


}

Upvotes: 1

neuviemeporte
neuviemeporte

Reputation: 6478

I'm answering this myself as this problem was related to three issues, two of which were brought up by the responses.

  1. The file actually wasn't UTF-8 encoded. I changed the encoding to iso-8859-1 and the encoding warning disappeared.
  2. The text() function doesn't work as I expected. I have to use readElementText() to read the entries' contents.
  3. When I try to readElementText() on an element that doesn't contain text, like the top-level <persons> in my case, the parser returns an "Expected character data" error and the parsing is interrupted. I find this behaviour strange (in my opinion returning an empty string and continuing would be better) but I guess as long as the specification is known, I can work around it and avoid calling this function on every entry.

The relevant code section that works as expected now looks like this:

while (!xml.atEnd() && !xml.hasError()) 
{
    xml.readNext();
    if (xml.isStartElement())
    {
        QString name = xml.name().toString();
        if (name == "firstname" || name == "surname" || 
            name == "email" || name == "website")
        {
            cout << "element name: '" << name  << "'" 
                         << ", text: '" << xml.readElementText() 
                         << "'" << endl;
        }
    }
}
if (xml.hasError())
{
    cout << "XML error: " << xml.errorString() << endl;
}
else if (xml.atEnd())
{
    cout << "Reached end, done" << endl;
}

Upvotes: 14

hmuelner
hmuelner

Reputation: 8221

Are you sure your document is UTF-8 encoded? What editor did you use? Check how the ä-characters look like if you view the file without decoding.

Upvotes: 2

Frank Osterfeld
Frank Osterfeld

Reputation: 25155

About the encoding: As baysmith and and hmuelner said, your file is probably incorrectly encoded (unless the encoding got lost when pasting it here). Try to fix that with some advanced text editor.

The problem with your usage of text() is that it doesn't work as you expect it to. text() returns the content of the current token if it is of type Characters, Comment, DTD or EntityReference. Your current token is a StartElement, so it's empty. If you want to consume/read the text of the current startElement, use readElementText() instead.

Upvotes: 3

baysmith
baysmith

Reputation: 5202

The file is not UTF-8 encoded. Change the encoding to iso-8859-1 and it will parse without error.

<?xml version="1.0" encoding="iso-8859-1" ?>

Upvotes: 4

Related Questions