Reputation: 175
How to screen scrape HTTPS using C#?
Upvotes: 3
Views: 4390
Reputation: 6729
Here's a concrete (albeit trivial) example. You can pass a ship name to VesselFinder in the querystring, but even if it only finds one ship with that name it still shows you the search results screen with one ship. This example detects that case and takes the user straight to the tracking map for the ship.
string strName = "SAFMARINE MAFADI";
string strURL = "https://www.vesselfinder.com/vessels?name=" + HttpUtility.UrlEncode(strName);
string strReturnURL = strURL;
string strToSearch = "/?imo=";
string strPage = string.Empty;
byte[] aReqtHTML;
WebClient objWebClient = new WebClient();
objWebClient.Headers.Add("User-Agent: Other"); //You must do this or HTTPS won't work
aReqtHTML = objWebClient.DownloadData(strURL); //Do the name search
UTF8Encoding utf8 = new UTF8Encoding();
strPage = utf8.GetString(aReqtHTML); // get the string from the bytes
if (strPage.IndexOf(strToSearch) != strPage.LastIndexOf(strToSearch))
{
//more than one instance found, so leave return URL as name search
}
else if (strPage.Contains(strToSearch) == true)
{
//find the ship's IMO
strPage = strPage.Substring(strPage.IndexOf(strToSearch)); //cut off the stuff before
strPage = strPage.Substring(0, strPage.IndexOf("\"")); //cut off the stuff after
}
strReturnURL = "https://www.vesselfinder.com" + strPage;
Upvotes: 1
Reputation: 3046
You can use System.Net.WebClient to grab web pages. Here is an example: http://www.codersource.net/csharp_screen_scraping.html
Upvotes: 4
Reputation: 13723
If for some reason you're having trouble with accessing the page as a web-client or you want to make it seem like the request is from a browser, you could use the web-browser control in an app, load the page in it and use the source of the loaded content from the web-browser control.
Upvotes: 2
Reputation: 5477
You can use System.Net.WebClient to start an HTTPS connection, and pull down the page to scrape with that.
Upvotes: 5