Reputation: 4893
I have HTML in form of a QByteArray
, and I would like to parse it.
QWebPage webpage;
webpage.mainFrame()->setContent(html);
QWebElementCollection elements = webpage.mainFrame()->findAllElements("div");
However, it turns up empty, even though the html has plenty of <div>
s.
If I print qDebug() << webpage.mainFrame()->toHtml();
all I see is "<html><head></head><body></body></html>"
anthough in html
there is a nice big page, with header, body, tables, and contents.
If I use setHtml
instead of setContent
by converting html
to QString
, I get a litte bit more, but not much. If I print qDebug() << webpage.mainFrame()->toHtml();
I see the header with its contents but without the stylesheets, but the body is completely omitted. It ends with "...</head></html>"
Upvotes: 0
Views: 92
Reputation: 29896
For large content, the DOM structure may not be available just after the calls to setContent
or setHtml
, you should let the event loop run and wait for the signal QWebPage::loadFinished()
before doing anything on the DOM (and even then, if there is javascript involved, the final DOM won't be there yet).
You can use QEventLoop
or a loop with QCoreApplication::processEvents()
to wait for the signal within the same function.
For instance:
QWebPage webpage;
QEventLoop loop;
// The signal is connected with Qt::QueuedConnection,
// so that the loadFinished signal always trigger the quit() slot
// even if the loop wasn't needed
QObject::connect(&webpage, SIGNAL(loadFinished(bool)), &loop, SLOT(quit()),
Qt::QueuedConnection);
webpage.mainFrame()->setContent(html);
loop.exec();
QWebElementCollection elements = webpage.mainFrame()->findAllElements("div");
Upvotes: 1