Reputation: 1179
I have to parse specific html code from a website. Here is part of it:
<div class="_ss">
<div class="info">
First info.
</div>
<div class="info">
Second info.
</div>
<div class="info">
Third info.
</div>
</div>
I've defined a regular expression as follows:
QRegExp rx("<div class=\"info\">(.+)</div>");
It currectly matches all blocks but the matched text includes all the subsequent blocks. For instance, in the case of Second
, it returns:
<div class="info">
Second info.
</div>
<div class="info">
Third info.
</div>
</div>
I thought i can just add ?
to my regex to get the planned result:
QRegExp rx("<div class=\"info\">(.+?)</div>");
However, using this regex results in no match at all.
Upvotes: 2
Views: 89
Reputation: 14931
I've browsed the regex docs of Qt. Jumping to the quantifiers section, it seems there's no way to make your quantifier lazy/ungreedy unlike in perl style regexes where you might add ?
after your quantifier. Reading the note in the quantifiers section it seems you will need to use setMinimal()
.
Here's a code sample:
QString str = "<div class=\"_ss\">\
<div class=\"info\">\
First info.\
</div>\
<div class=\"info\">\
Second info.\
</div>\
<div class=\"info\">\
Third info.\
</div>\
</div>"; // Some input
QStringList list;
int pos = 0;
QRegExp rx("<div class=\"info\">(.+)</div>");
rx.setMinimal(true); // Make our regex lazy/ungreedy
// Looping through our matches
while((pos = rx.indexIn(str, pos)) != -1){
list << rx.cap(1); // Add group 1 to our list
pos += rx.matchedLength();
}
// Looping and printing
for(pos = 0;pos < list.size();pos++){
std::cout << list.at(pos).toStdString() << std::endl;
}
Note: You might need to trim the results since the spaces are also included.
Upvotes: 1