Andreas
Andreas

Reputation: 27

Regex pattern for finding HTML image tag with src to the internet

I'm having some problems understanding the regex pattern syntax. I'm using Outlook interop to go through the HTMLbody of an email.msg.

I want to remove all the images that has a reference to the internet. So I'm useing Regex.Replace to find all image tags and replacing them with text.

This is what, I've:

string altText = " <i>*Reference to picture on the internet removed*</i> "; string b = Regex.Replace(a, @"(<img([^>]+)>)", altText);

This works, but I want to find the tags that only have src from the internet. I found this in my google search:

string matchString = Regex.Match(a, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

But it will not help since it looks like all images have a src tag. My goal is to write a pattern syntax if possible in Regex where i check if the source ( src ) starts with http, https or www.

Is there anyone who can help me with this?

Upvotes: 0

Views: 1453

Answers (1)

StfBln
StfBln

Reputation: 1157

I would suggest to use an HTML parser in order to find your images tag rather than a regex directly. You can then use a Regex in order to check the src attribute if required.

In the meantime, I believe the following regex will produce the results you are expecting:

<img.+?src=[\"']((?:https?|www).*)[\"'].*?>

Regex Cases: Regex

Edit It is to be noted as well that sometimes links can just start by //. The following regex should do it:

<img.+?src=[\"']((?:https?|www|//).*)[\"'].*?>

For a more extensive Regex solution matching URL, please see What is a good regular expression to match a URL?

Upvotes: 1

Related Questions