reckless
reckless

Reputation: 933

How to get the src of an image tag using QRegExp and QString

So I have got a string in my app which contains an html img tag

<img src="imagsource.jpg" width="imageWidth" />

Now I want to extract the image tag and its src attribute in two different strings. So what I tried to do is this:

QRegExp imageRegex("\\<img[^\\>]*src\\s*=\\s*\"([^\"]*)\"[^\\>]*\\>", Qt::CaseInsensitive);

int a = imageRegex.indexIn(description);
int b = a + imageRegex.matchedLength();

QString imgTag = description.mid(a,b); // this kind of works but doesn't return the img tag properly (extra information is included)

// how to obtain the "src" attribute, I have tried this: src\s*=\s*\"(.+?)" but it doesn't work
QString imgSrc = ??

I have tried to look at other posts regarding how to extract strings from other string using regex, I have tried to use the same patterns in QRegExp but they don't seem to give the correct result.

Upvotes: 0

Views: 323

Answers (2)

user557597
user557597

Reputation:

Give this a try

<img(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\ssrc\s*=\s*(?:(['"])([\S\s]*?)\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

https://regex101.com/r/qaQPPU/1

Where, the src value is in capture group 2.

Readable regex

 < img                  # Begin img tag
 (?= \s )
 (?=                    # Asserttion (a pseudo atomic group)
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s src \s* = \s*       # src Attribute
      (?:
           ( ['"] )               # (1), Quote
           ( [\S\s]*? )           # (2), src Value
           \1 
      )
 )
                        # Have the value, just match the rest of tag
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+

 >                      # End tag

Update

Use the Qt version 5 or greater (5.11 ?).

Using that version is more Perl like regex.

Ref: http://doc.qt.io/qt-5/qregularexpression.html

Example:

QRegularExpression re("<img(?=\\s)(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\ssrc\\s*=\\s*(?:(['\"])([\\S\\s]*?)\\1))\\s+(?:\"[\\S\\s]*?\"|'[\\S\\s]*?'|[^>]*?)+>");
QRegularExpressionMatch match = re.match("<img src=\"imagsource.jpg\"     width=\"imageWidth\" />", 1);
if (match.hasMatch()) {
    QString matched = match.captured(2); // matched -> imagsource.jpg
    // ...
}

Upvotes: 2

Jacob Boertjes
Jacob Boertjes

Reputation: 981

You can use this:

<img.*src=(?:"(.*?)"|'(.*?)').*>

https://regex101.com/r/qaQPPU/3

It will capture the entire tag in the whole match, then the contents of the src tag in the first group.

Upvotes: 0

Related Questions