ntjz_kakarot
ntjz_kakarot

Reputation: 13

String matching in Qt from an html

I'm trying to get specific words out of an html and display them in a plain text edit for the moment(I will later add them into a table). Even though I managed to get the beginning of the word, I'm unable to get the end part. It shows all the content from the starting position. The html is something like this:

<span class="title">Some name here</span>

This is the code, I wrote.

int sTitle = html_code.indexOf("title\">") + 7;
int eTitle = html_code.indexOf("</span>");
int titLength = eTitle - sTitle;

QString title = html_code.mid(sTitle, titLength);

ui->searchBox->setPlainText(title);

And also there're a lot of /span and title tags in the html.Thank you!

Upvotes: 1

Views: 265

Answers (2)

Orest Hera
Orest Hera

Reputation: 6776

Your code works perfectly if the following string is assigned to html_code:

 QString html_code = "<span class=\"title\">Some name here</span>";

However for more complex documents you may consider usage of heavy but powerful tool QtWebKit and its QWebElement class that provides access to tree structure of DOM elements of (X)HTML document. It will allow you to search only first specific tag (or more complex structures) or collection of all interesting entries, for example

#include <QWebPage>
#include <QWebFrame>
#include <QWebElement>

void MainWindow::some_handler()
{
    QString html_code = "<span class=\"title\">Some name here</span>"
        "<span class=\"title\">Some other name here</span>";

    QWebPage page;
    QWebFrame *frame = page.mainFrame();
    frame->setHtml(html_code);
    QWebElement document = frame->documentElement();

    // one item
    QWebElement title = document.findFirst("span.title");

    QString text;
    text += "First title span:\n\t" + title.toPlainText() + '\n';

    // all items
    QWebElementCollection title_collection = document.findAll("span.title");
    text += "\nAll title spans:\n";

    foreach (QWebElement elem, title_collection) {
        text += '\t' + elem.toPlainText() + '\n';
    }

    ui->searchBox->setPlainText(text);
}

The following module should be added in the project file QT += webkitwidgets to build the above code.

Note that the QWebPage object works like a browser. It loads linked content and runs JavaScript. If it is not desired some other xml parsers may be considered, for example Qt XML module. This module is not actively supported, however it also provides API for tree structure of document elements via QDomDocument, QDomElement and QDomNodeList classes. The code is not so nice as with QWebElement, since here it is needed to loop over node list and manually check node type and its attribude "class", for example

QDomDocument document;
document.setContent(html_code);
QDomElement elem = document.documentElement();
QDomNodeList node_list = elem.elementsByTagName("span");
QString text;
for (int i = 0; i < node_list.length(); ++i) {
    if (node_list.at(i).isElement() &&
        node_list.at(i).toElement().attribute("class") == "title")
    {
        text += node_list.at(i).toElement().text() + '\n';
    }
}

Upvotes: 1

mR.aTA
mR.aTA

Reputation: 314

try this:

int sTitle = html_code.indexOf("title\">") + 7;
int eTitle = html_code.indexOf("</span>");
QStringRef title(html_code, sTitle, eTitle);  
ui->searchBox->setPlainText(title.toString());    

Upvotes: 0

Related Questions