Reputation: 1401
I need to extract the web address from this string:
<p> Feb 24 - <a href="http://austin.daylife.org/apa/2867907745.html">$390 / 2br - 600ft² - Sleeps 4-Walk to SXSW-SOCO-Perfect Location</a> - <font size="-1"> (South 5th)</font> <span class="p"> pic</span></p>
How can I achieve the same using regular expression in C#?
Upvotes: 0
Views: 462
Reputation: 43094
This works for me:
string source = " <p> Feb 24 - <a href=\"http://austin.daylife.org/apa/2867907745.html\">$390 / 2br - 600ft² - Sleeps 4-Walk to SXSW-SOCO-Perfect Location</a> - <font size=\"-1\"> (South 5th)</font> <span class=\"p\"> pic</span></p> ";
Regex regex = new Regex("<a[^>]*? href=\"(?<url>[^\"]+)\"[^>]*?>(?<text>.*?)</a>");
var m = regex.Match(source);
string url = m.Groups["url"];
Upvotes: 1
Reputation: 33139
Use this regular expression:
http(s)?://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?
EDIT: Simpler expression:
http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?
Upvotes: 1