Reputation: 19528
I am trying to get an ID from a url parameter inside an href that looks like this:
<a href="http://www.mysite.com/myitem.php?id=71312">MyItemName</a>
I want the 71312 only and at the momment I am trying to do it using regex (but if you have a better approch I would be glad to try):
string html,itemID;
using (var client = new WebClient())
{
html = client.DownloadString("http://www.mysite.com/search.php?search_text=" + myItemName);
}
string pattern = "<a href=\"http://www.mysite.com/myitem.php?id=(\d+)\">" + myItemName + "</a>";
Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase);
if (m.Success)
{
itemID = m.Groups[1].Value;
MessageBox.Show(itemID);
}
Example of the html:
more html body
<h1>Items - List</h1>
<p><a href="http://www.mysite.com/myitem.php?id=12313">MyItemNameTest</a>, <a href="http://www.mysite.com/myitem.php?id=83">MyItemNameTestB</a>, <a href="http://www.mysite.com/myitem.php?id=213784">MYItemNameOther</a></p>
</div>
more html body
Upvotes: 0
Views: 621
Reputation: 2258
Use:
Uri u = new Uri("http://www.mysite.com/myitem.php?id=12313");
string s = u.Query;
HttpUtility.ParseQueryString(s).Get("id");
In variable id
you have the number. Figure out the rest of the function :)
Upvotes: 0
Reputation: 336438
To show where your regex went wrong:
.
and ?
are special characters in regular expressions. .
means "any character" and ?
means "zero or one occurences of the previous expression". Therefore your regex fails to match. Also, you need to use verbatim strings in C# (unless you want to escape every backslash):
@"<a href=\"http://www\.mysite\.com/myitem\.php\?id=(\d+)\">" + myItemName + "</a>";
will probably work.
That said, unless all the links you're examining follow exactly this format, you might run into problems. It's kind of a running gag here on SO that parsing HTML with regular expressions will earn you the wrath of Cthulhu.
Upvotes: 1