Reputation: 27
I'm having some problems understanding the regex pattern syntax.
I'm using Outlook interop
to go through the HTMLbody
of an email.msg.
I want to remove all the images that has a reference to the internet.
So I'm useing Regex.Replace
to find all image tags and replacing them with text.
This is what, I've:
string altText = " <i>*Reference to picture on the internet removed*</i> ";
string b = Regex.Replace(a, @"(<img([^>]+)>)", altText);
This works, but I want to find the tags that only have src
from the internet.
I found this in my google search:
string matchString = Regex.Match(a, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
But it will not help since it looks like all images have a src
tag. My goal is to write a pattern syntax if possible in Regex
where i check if the source ( src
) starts with http, https or www.
Is there anyone who can help me with this?
Upvotes: 0
Views: 1453
Reputation: 1157
I would suggest to use an HTML parser in order to find your images tag rather than a regex directly. You can then use a Regex in order to check the src attribute if required.
In the meantime, I believe the following regex will produce the results you are expecting:
<img.+?src=[\"']((?:https?|www).*)[\"'].*?>
Regex Cases: Regex
Edit
It is to be noted as well that sometimes links can just start by //
. The following regex should do it:
<img.+?src=[\"']((?:https?|www|//).*)[\"'].*?>
For a more extensive Regex solution matching URL, please see What is a good regular expression to match a URL?
Upvotes: 1