Mervin
Mervin

Reputation: 113

How to get HTML code from webpage?

I'm trying to get HTML code from a specific webpage, but when I do it using

        HttpWebRequest request;
        HttpWebResponse response;
        StreamReader streamReader;
        request = (HttpWebRequest)WebRequest.Create(pageURL);
        response = (HttpWebResponse)request.GetResponse();
        streamReader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("windows-1251"));
        htmlCode = streamReader.ReadToEnd();
        streamReader.Close();

or using WebClient, I get redirected to a login page and I get its code. Is there any other way to get HTML code?

I read some information here: How to get HTML from a current request, in a postback , but didn't understand what should I do, or how and where to specify URL.

P.S.: I'm logged-in in a browser. Notepad++ perfectly gets what I need via "right click - view source code".

Thanks.

Upvotes: 2

Views: 415

Answers (3)

Felice Pollano
Felice Pollano

Reputation: 33272

If you want to scrap an html page that requires autentication, I suggest you to use Watin to fill the proper fields and navigate to the pages you want to download. Maybe iot seems a little overkilling at a first glance, but it will save a lot of troubles later.

Upvotes: 0

Swomble
Swomble

Reputation: 909

If the page you want to get to is behind a login screen - you're going to need to do the login mechanism through code. And add an associated CookieCollection to hold the login cookie that the website will try to drop on your Request.

Alternatively, if you have a user who can help the program along, you could try listing the cookies for the site once they've logged in through their browser. Copy that cookie across and add it to the CookieCollection.

Cheers Simon

Upvotes: 1

Quentin
Quentin

Reputation: 944442

If you get redirected to a login page, then presumably you must be logged in before you can get the content.

So you need to make a request, with suitable credentials, to the login page. Get whatever tokens are sent (usually in the form of cookies) to maintain the login. Then request the page you want (sending the cookies with the request).

Alternatively (and this is the preferred approach), most major sites that expect automated systems to interact with them provide an API (often using OAuth for authentication). Consult their documentation to see how their API works.

Upvotes: 2

Related Questions