Efog
Efog

Reputation: 1179

Regular expression not working as planned

I have to parse specific html code from a website. Here is part of it:

<div class="_ss">
    <div class="info">
        First info.
    </div>
    <div class="info">
        Second info.
    </div>
    <div class="info">
        Third info.
    </div>
</div>

I've defined a regular expression as follows:

QRegExp rx("<div class=\"info\">(.+)</div>");

It currectly matches all blocks but the matched text includes all the subsequent blocks. For instance, in the case of Second, it returns:

    <div class="info">
        Second info.
    </div>
    <div class="info">
        Third info.
    </div>
</div>

I thought i can just add ? to my regex to get the planned result:

QRegExp rx("<div class=\"info\">(.+?)</div>");

However, using this regex results in no match at all.

Upvotes: 2

Views: 89

Answers (1)

HamZa
HamZa

Reputation: 14931

I've browsed the regex docs of Qt. Jumping to the quantifiers section, it seems there's no way to make your quantifier lazy/ungreedy unlike in perl style regexes where you might add ? after your quantifier. Reading the note in the quantifiers section it seems you will need to use setMinimal().

Here's a code sample:

QString str = "<div class=\"_ss\">\
        <div class=\"info\">\
            First info.\
        </div>\
        <div class=\"info\">\
            Second info.\
        </div>\
        <div class=\"info\">\
            Third info.\
        </div>\
    </div>"; // Some input

QStringList list;
int pos = 0;

QRegExp rx("<div class=\"info\">(.+)</div>");
rx.setMinimal(true); // Make our regex lazy/ungreedy

// Looping through our matches
while((pos = rx.indexIn(str, pos)) != -1){
    list << rx.cap(1); // Add group 1 to our list
    pos += rx.matchedLength();
}

// Looping and printing
for(pos = 0;pos < list.size();pos++){
    std::cout << list.at(pos).toStdString() << std::endl;
}

Note: You might need to trim the results since the spaces are also included.

Upvotes: 1

Related Questions