vsz
vsz

Reputation: 4893

Why does QWebFrame omit a large part of my HTML?

I have HTML in form of a QByteArray, and I would like to parse it.

QWebPage webpage;
webpage.mainFrame()->setContent(html);
QWebElementCollection elements = webpage.mainFrame()->findAllElements("div");

However, it turns up empty, even though the html has plenty of <div>s.

If I print qDebug() << webpage.mainFrame()->toHtml(); all I see is "<html><head></head><body></body></html>" anthough in html there is a nice big page, with header, body, tables, and contents.

If I use setHtml instead of setContent by converting html to QString, I get a litte bit more, but not much. If I print qDebug() << webpage.mainFrame()->toHtml(); I see the header with its contents but without the stylesheets, but the body is completely omitted. It ends with "...</head></html>"

Upvotes: 0

Views: 92

Answers (1)

alexisdm
alexisdm

Reputation: 29896

For large content, the DOM structure may not be available just after the calls to setContent or setHtml, you should let the event loop run and wait for the signal QWebPage::loadFinished() before doing anything on the DOM (and even then, if there is javascript involved, the final DOM won't be there yet).

You can use QEventLoop or a loop with QCoreApplication::processEvents() to wait for the signal within the same function. For instance:

QWebPage webpage;

QEventLoop loop;
// The signal is connected with Qt::QueuedConnection, 
// so that the loadFinished signal always trigger the quit() slot 
// even if the loop wasn't needed
QObject::connect(&webpage, SIGNAL(loadFinished(bool)), &loop, SLOT(quit()), 
                 Qt::QueuedConnection);
webpage.mainFrame()->setContent(html);

loop.exec();

QWebElementCollection elements = webpage.mainFrame()->findAllElements("div");

Upvotes: 1

Related Questions