adils
adils

Reputation: 11

Need help extracting label from HTML page in C#

I want to load one label's value from a remote HTML page. I have done that by loading the whole page and than using regex. I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page. Any suggestions?

This is what I'm doing at the moment:

using (var client = new WebClient())
{
    string result = c          client.DownloadString("http://web.archive.org/http://profiles.yahoo.com/italy_");
    var regex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
                          RegexOptions.Compiled);
    var s = result;
    foreach (Match email in regex.Matches(s))
    {
        // Console.WriteLine(email.Value);
        label2.Text = email.Value;
    }
}

Upvotes: 1

Views: 114

Answers (2)

Tim M.
Tim M.

Reputation: 54417

I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page.

Couple of thoughts:

  • Archive.org is usually very slow in my experience. My guess is that's your bottleneck.

  • No, there is not a way to only make a partial request to a third-party page unless they have a response mechanism capable of returning more specific data (for example, a JSON-enabled web service that returns little snippets of HTML used on the page).

  • You will usually have better luck with parsing by loading data into some kind of HTML parser rather than using a regex.

Upvotes: 2

Nathan
Nathan

Reputation: 6216

You must load the whole page - that's the way http requests generally work.

Maybe your regex could be improved? Not my area of expertise though, sorry.

Upvotes: 2

Related Questions