Reputation: 933
So I have got a string in my app which contains an html img tag
<img src="imagsource.jpg" width="imageWidth" />
Now I want to extract the image tag and its src
attribute in two different strings. So what I tried to do is this:
QRegExp imageRegex("\\<img[^\\>]*src\\s*=\\s*\"([^\"]*)\"[^\\>]*\\>", Qt::CaseInsensitive);
int a = imageRegex.indexIn(description);
int b = a + imageRegex.matchedLength();
QString imgTag = description.mid(a,b); // this kind of works but doesn't return the img tag properly (extra information is included)
// how to obtain the "src" attribute, I have tried this: src\s*=\s*\"(.+?)" but it doesn't work
QString imgSrc = ??
I have tried to look at other posts regarding how to extract strings from other string using regex, I have tried to use the same patterns in QRegExp
but they don't seem to give the correct result.
Upvotes: 0
Views: 323
Reputation:
Give this a try
<img(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\ssrc\s*=\s*(?:(['"])([\S\s]*?)\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
https://regex101.com/r/qaQPPU/1
Where, the src value is in capture group 2.
Readable regex
< img # Begin img tag
(?= \s )
(?= # Asserttion (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s src \s* = \s* # src Attribute
(?:
( ['"] ) # (1), Quote
( [\S\s]*? ) # (2), src Value
\1
)
)
# Have the value, just match the rest of tag
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
> # End tag
Update
Use the Qt version 5 or greater (5.11 ?).
Using that version is more Perl like regex.
Ref: http://doc.qt.io/qt-5/qregularexpression.html
Example:
QRegularExpression re("<img(?=\\s)(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\ssrc\\s*=\\s*(?:(['\"])([\\S\\s]*?)\\1))\\s+(?:\"[\\S\\s]*?\"|'[\\S\\s]*?'|[^>]*?)+>");
QRegularExpressionMatch match = re.match("<img src=\"imagsource.jpg\" width=\"imageWidth\" />", 1);
if (match.hasMatch()) {
QString matched = match.captured(2); // matched -> imagsource.jpg
// ...
}
Upvotes: 2
Reputation: 981
You can use this:
<img.*src=(?:"(.*?)"|'(.*?)').*>
https://regex101.com/r/qaQPPU/3
It will capture the entire tag in the whole match, then the contents of the src tag in the first group.
Upvotes: 0