Alaa Salah
Alaa Salah

Reputation: 885

Qt / QRegularExpression - Can't capture all results, only 1st instance, why?

I am trying to get some text surrounded by <td> tags. My problem is that I am able to fetch the first result only, and can't get others.

From the following HTML, I only get the 1st result which is this text:

Student Name

But all other attempts to capture the rest of the needed text are empty, null. Why is that & what am I doing wrong?

Text for regular expression to work on:

<table width="52%" border="1" align="center" cellpadding="1" cellspacing="1">
  <tr>
    <td colspan="2" align="center" bgcolor="#999999">Result</td>
    </tr>
  <tr>
    <td width="22%"><strong>Student ID</strong></td>
    <td width="78%">13/0003337/99</td>
  </tr>
  <tr>
    <td><strong>Student Name</strong></td>
    <td>Alaa Salah Yousuf Omer</td>
  </tr>
  <tr>
    <td><strong>College</strong></td>
    <td>Medicine & General Surgery</td>
  </tr>
  <tr>
    <td><strong>Subspecialty</strong></td>
    <td>General</td>
  </tr>
  <tr>
    <td><strong>Semester</strong></td>
    <td>Fourth</td>
  </tr>
  <tr>
    <td><strong>State</strong></td>
    <td>Pass</td>
  </tr>
  <tr>
    <td><strong>Semester's GPA</strong></td>
    <td>2.89</td>
  </tr>
  <tr>
    <td><strong>Overall GPA</strong></td>
    <td>3.13</td>
  </tr>
  </table>

My code:

QString resultHTML = "A variable containing the html code written above."

QRegularExpression regex("<td>(.*)</td>", QRegularExpression::MultilineOption);
QRegularExpressionMatch match = regex.match(resultHTML);

// I only get the 1st result logged withing debugger
for(int x = 0; x <= match.capturedLength(); x++)
{
    qDebug() << match.captured(x);
}

// This here doesn't get me anything, null!
_studentName = match.captured(2);
_semesterWritten = match.captured(8);
_stateWritten = match.captured(10);
_currentGPA = match.captured(12);
_overallGPA = match.captured(14);

Upvotes: 0

Views: 1600

Answers (2)

Jordan Pilat
Jordan Pilat

Reputation: 417

You are looking to apply what Perl refers to as the global regex flag/modifier, which means, continue looking for matches after the first one has been found.

In order to do that with QT, try using globalMatch() versus match().

The former will return a QRegularExpressionIterator, over which you can iterate to find all your matches.

Additionally, the * in <td>(.*)</td> is greedy, so it will find the first instance of <td>, then capture as much as possible (including most of your content and additional <td> tags), as long as it can find a </td> at the end.

There are different ways of avoiding this. One way is to use <td>(.*?)</td>, which would capture as little as possible, as long as it can find a </td> at the end. This would essentially capture everything within a single <td /> tag, as long as there isn't another <td /> nested further within (which doesn't look to be the case in your scenario).

Additionally, the QRegularExpression::MultilineOption PatternOption isn't needed here, since it pertains to the regex characters ^ and $, which you aren't using.

You might instead be interested in the QRegularExpression::DotMatchesEverythingOption PatternOption, which includes newlines in dots, just in case those <td /> tags, or the values contained within, happen to span multiple lines

Upvotes: 2

Vladimir Bershov
Vladimir Bershov

Reputation: 2832

...Global matching is useful to find all the occurrences of a given regular expression inside a subject string...

QRegularExpressionMatchIterator i = regex.globalMatch(resultHTML);

while (i.hasNext()) 
{
    QRegularExpressionMatch match = i.next();        
    qDebug() << match.captured();
}

Upvotes: 2

Related Questions