JC.
JC.

Reputation: 11

Regex to match if string DOES NOT have more than one period . Matching URL paths that are NOT fully qualified

I've loaded an HTML doc into a string with .NET. I have this REGEX which I can use to match URLs and replace them, but I need only to match ONLY URLs that are NOT fully qualified.

If this is my string:

djdjdjdjdjdj src="www.example.com/images/x.gif" dkkdkdkdk src="/images/x.gif

My result result would look like this:

djdjdjdjdjdj src="subdomain.example.com/images/x.gif" dkkdkdkdk src="http://www.example.com/images/x.gif

My thinking is I need a REGEX that will match strings that start with src or href and that do not have more than one period. This Regex matches links that have at least one period so it's not matching them correctly.

(src|href)\=(\"(.+?)[\.](.+?)\")

Thanks for any info. I'm coding this in C# but only need the REGEX

Upvotes: 1

Views: 699

Answers (2)

Tj Kellie
Tj Kellie

Reputation: 6476

I would suggest you try to use something like the HTML Agility parser, as reccomended many times on this site: Looking for C# HTML parser

Also it wouldn't hurt to read this obscure blog entry by some Metallica fan before you start.

Upvotes: 3

Zen
Zen

Reputation: 119

Warning : HTML + regex = round peg + square hole

That being said, here's the hammer you requested

(src|href)\=(\"[^."]*\.?[^."]\")

Upvotes: 1

Related Questions