Jignesh
Jignesh

Reputation: 175

Screen scraping HTTPS using C#

How to screen scrape HTTPS using C#?

Upvotes: 3

Views: 4390

Answers (5)

SteveCav
SteveCav

Reputation: 6729

Here's a concrete (albeit trivial) example. You can pass a ship name to VesselFinder in the querystring, but even if it only finds one ship with that name it still shows you the search results screen with one ship. This example detects that case and takes the user straight to the tracking map for the ship.

        string strName = "SAFMARINE MAFADI";
        string strURL = "https://www.vesselfinder.com/vessels?name=" + HttpUtility.UrlEncode(strName);
        string strReturnURL = strURL;
        string strToSearch = "/?imo=";
        string strPage = string.Empty;
        byte[] aReqtHTML;


        WebClient objWebClient = new WebClient();
        objWebClient.Headers.Add("User-Agent: Other");   //You must do this or HTTPS won't work
        aReqtHTML = objWebClient.DownloadData(strURL);  //Do the name search
        UTF8Encoding utf8 = new UTF8Encoding();

        strPage = utf8.GetString(aReqtHTML); // get the string from the bytes

        if (strPage.IndexOf(strToSearch) != strPage.LastIndexOf(strToSearch))
        {
            //more than one instance found, so leave return URL as name search
        }
        else if (strPage.Contains(strToSearch) == true)
        {
            //find the ship's IMO 
            strPage = strPage.Substring(strPage.IndexOf(strToSearch)); //cut off the stuff before
            strPage = strPage.Substring(0, strPage.IndexOf("\"")); //cut off the stuff after

        }

        strReturnURL = "https://www.vesselfinder.com" + strPage;

Upvotes: 1

zfedoran
zfedoran

Reputation: 3046

You can use System.Net.WebClient to grab web pages. Here is an example: http://www.codersource.net/csharp_screen_scraping.html

Upvotes: 4

Cyril Gupta
Cyril Gupta

Reputation: 13723

If for some reason you're having trouble with accessing the page as a web-client or you want to make it seem like the request is from a browser, you could use the web-browser control in an app, load the page in it and use the source of the loaded content from the web-browser control.

Upvotes: 2

RichardOD
RichardOD

Reputation: 29157

Look into the Html Agility Pack.

Upvotes: 5

Brett Allen
Brett Allen

Reputation: 5477

You can use System.Net.WebClient to start an HTTPS connection, and pull down the page to scrape with that.

Upvotes: 5

Related Questions