Reputation: 885
I am trying to get some text surrounded by <td>
tags. My problem is that I am able to fetch the first result only, and can't get others.
From the following HTML, I only get the 1st result which is this text:
Student Name
But all other attempts to capture the rest of the needed text are empty, null. Why is that & what am I doing wrong?
Text for regular expression to work on:
<table width="52%" border="1" align="center" cellpadding="1" cellspacing="1">
<tr>
<td colspan="2" align="center" bgcolor="#999999">Result</td>
</tr>
<tr>
<td width="22%"><strong>Student ID</strong></td>
<td width="78%">13/0003337/99</td>
</tr>
<tr>
<td><strong>Student Name</strong></td>
<td>Alaa Salah Yousuf Omer</td>
</tr>
<tr>
<td><strong>College</strong></td>
<td>Medicine & General Surgery</td>
</tr>
<tr>
<td><strong>Subspecialty</strong></td>
<td>General</td>
</tr>
<tr>
<td><strong>Semester</strong></td>
<td>Fourth</td>
</tr>
<tr>
<td><strong>State</strong></td>
<td>Pass</td>
</tr>
<tr>
<td><strong>Semester's GPA</strong></td>
<td>2.89</td>
</tr>
<tr>
<td><strong>Overall GPA</strong></td>
<td>3.13</td>
</tr>
</table>
My code:
QString resultHTML = "A variable containing the html code written above."
QRegularExpression regex("<td>(.*)</td>", QRegularExpression::MultilineOption);
QRegularExpressionMatch match = regex.match(resultHTML);
// I only get the 1st result logged withing debugger
for(int x = 0; x <= match.capturedLength(); x++)
{
qDebug() << match.captured(x);
}
// This here doesn't get me anything, null!
_studentName = match.captured(2);
_semesterWritten = match.captured(8);
_stateWritten = match.captured(10);
_currentGPA = match.captured(12);
_overallGPA = match.captured(14);
Upvotes: 0
Views: 1600
Reputation: 417
You are looking to apply what Perl refers to as the global regex flag/modifier, which means, continue looking for matches after the first one has been found.
In order to do that with QT, try using globalMatch() versus match().
The former will return a QRegularExpressionIterator, over which you can iterate to find all your matches.
Additionally, the * in <td>(.*)</td>
is greedy, so it will find the first instance of <td>
, then capture as much as possible (including most of your content and additional <td>
tags), as long as it can find a </td>
at the end.
There are different ways of avoiding this. One way is to use <td>(.*?)</td>
, which would capture as little as possible, as long as it can find a </td>
at the end. This would essentially capture everything within a single <td />
tag, as long as there isn't another <td />
nested further within (which doesn't look to be the case in your scenario).
Additionally, the QRegularExpression::MultilineOption PatternOption isn't needed here, since it pertains to the regex characters ^ and $, which you aren't using.
You might instead be interested in the QRegularExpression::DotMatchesEverythingOption PatternOption, which includes newlines in dots, just in case those <td />
tags, or the values contained within, happen to span multiple lines
Upvotes: 2
Reputation: 2832
...Global matching is useful to find all the occurrences of a given regular expression inside a subject string...
QRegularExpressionMatchIterator i = regex.globalMatch(resultHTML);
while (i.hasNext())
{
QRegularExpressionMatch match = i.next();
qDebug() << match.captured();
}
Upvotes: 2