Reputation:
I want to get a url from a string. Heres my code to extract an img url.
var imgReg = new Regex("img\\s*src\\s*=\\s*\"(.*)\"");
string imgLink = imgReg.Match(page, l, r - l).Groups[1].Value;
The result was
http://url.com/file.png" border="0" alt="
How do i fix this so it ends at the first "? I tried something like
var imgReg = new Regex("img\\s*src\\s*=\\s*\"(.*[^\\\"])\"");
But i got the same results as the original.
Upvotes: 2
Views: 392
Reputation: 17528
Your .*
is too greedy. Change it to the following and it will select everything up to the next double-quote.
Source Text: <img src="http://url.com/file.png" border="0" alt="" />
<img src='http://url.com/file.png' border='0' alt='' />
RegEx: <img\s*src\s*=\s*[\"\']([^\"\']+)[\"\']
I just changed the (.*
) to ([^"]+)
. This means that you'll grab every non-double-quote character up to the next part of the regex. It also supports single- or double-quotes.
Upvotes: 1
Reputation: 1513
Try this:
var imgReg = new Regex(@"img\s+src\s*=\s*""([^""']*)""");
Also, note the "\s+" instead of "\s*" after "img". You need at least one space there.
You can also use the non-greedy (or "lazy") version of the star operator, which, instead of matching as much as possible, would match a little as possible and stop, as you would like, at the first ending quote:
var imgReg = new Regex(@"img\s+src\s*=\s*""(.*?)""");
(note the "?" after ".*")
Upvotes: 4
Reputation: 25523
Please consider using a DOM (such as the Html Agility Pack) to parse HTML rather than using regular expressions. A DOM should handle all edge cases; regular expressions won't.
Upvotes: 3
Reputation: 558
What it looks like to me is, your (*.) is catching the double quotes you don't want to match.
You can do """ to match a double quote, or do something like this for your link matching
Match(input, @"http://(\w./)+.png");
Upvotes: 0