Get the number of an href url parameter from downloaded html page?

Question

I am trying to get an ID from a url parameter inside an href that looks like this:

MyItemName

I want the 71312 only and at the momment I am trying to do it using regex (but if you have a better approch I would be glad to try):

        string html,itemID;
        using (var client = new WebClient())
        {
            html = client.DownloadString("http://www.mysite.com/search.php?search_text=" + myItemName);
        }

        string pattern = "" + myItemName + "";
        Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase);
        if (m.Success)
        {
            itemID = m.Groups[1].Value;
            MessageBox.Show(itemID);
        }

Example of the html:

more html body
Items - List
MyItemNameTest, MyItemNameTestB, MYItemNameOther


more html body

Tim Pietzcker · Accepted Answer

To show where your regex went wrong:

. and ? are special characters in regular expressions. . means "any character" and ? means "zero or one occurences of the previous expression". Therefore your regex fails to match. Also, you need to use verbatim strings in C# (unless you want to escape every backslash):

@"" + myItemName + "";

will probably work.

That said, unless all the links you're examining follow exactly this format, you might run into problems. It's kind of a running gag here on SO that parsing HTML with regular expressions will earn you the wrath of Cthulhu.

Get the number of an href url parameter from downloaded html page?

Answers (2)

Related Questions