Reputation: 11
I've loaded an HTML doc into a string with .NET. I have this REGEX which I can use to match URLs and replace them, but I need only to match ONLY URLs that are NOT fully qualified.
If this is my string:
djdjdjdjdjdj src="www.example.com/images/x.gif" dkkdkdkdk src="/images/x.gif
My result result would look like this:
djdjdjdjdjdj src="subdomain.example.com/images/x.gif" dkkdkdkdk src="http://www.example.com/images/x.gif
My thinking is I need a REGEX that will match strings that start with src
or href
and that do not have more than one period. This Regex matches links that have at least one period so it's not matching them correctly.
(src|href)\=(\"(.+?)[\.](.+?)\")
Thanks for any info. I'm coding this in C# but only need the REGEX
Upvotes: 1
Views: 699
Reputation: 6476
I would suggest you try to use something like the HTML Agility parser, as reccomended many times on this site: Looking for C# HTML parser
Also it wouldn't hurt to read this obscure blog entry by some Metallica fan before you start.
Upvotes: 3
Reputation: 119
Warning : HTML + regex = round peg + square hole
That being said, here's the hammer you requested
(src|href)\=(\"[^."]*\.?[^."]\")
Upvotes: 1