Reputation: 89
I'm attempting to parse text from a webpage that has a username and password (or download the text as a .txt file). I've been cruising around the net and stackoverflow for a few days looking for a solution. It seems like there should be a simple solution but thus far I am unable to find it. The below code seems to be the most logical and straight forward code I've found thus far. It is currently returning a Error 401 code.
private void Form1_Load(object sender, EventArgs e){
ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback
(
delegate { return true; }
);
using (var client = new CookieAwareWebClient())
{
var values = new NameValueCollection
{
{ "username", "username" },
{ "password", "password" },
};
client.UploadValues("https://website/", values);
string result = client.DownloadString("https://website/licences");
lbl1.Text = result;
}
}
Upvotes: 0
Views: 520
Reputation: 2792
Just use an HTTPWebRequest to retrieve data from the external site and parse what you need from the WebResponse. Depending on the authentication mechanism the site uses (basic authentication, forms authentication, etc.), you will need to use slightly different techniques to authenticate. The accepted answer in this SO Post has some good examples. To paraphrase, if it is Basic Auth or Windows Auth, then you can use the Network Credentials class and pass it with the request. If it uses some kind of cookie-based auth, you are going to have to construct a form post, get the auth cookie, and then pass the cookie in with your request for data.
Upvotes: 1
Reputation: 3115
Yes, there is a simple solution.
Since you need to scrape some text from a third party website you need a browser. You need to do it pragmatically hence you need a programmable browser.
There are some headless programmable browsers available for .NET (listed below). You can include them in your project using nuget packages and program your requirement further (i.e. write code to identify input boxes to type username and password and click the login button etc)
HTML Agility Pack - http://htmlagilitypack.codeplex.com/
Webkit - http://sourceforge.net/projects/webkitdotnet/
Watin - http://watin.org/
SimpleBrowser - https://github.com/axefrog/SimpleBrowser
Along with this you can use CsQuery for parse your DOM like you would do using jQuery. Yes CsQuery is C# port of jQuery. Its really a great tool
CsQury - https://github.com/jamietre/CsQuery
Upvotes: 0