Lilz
Lilz

Reputation: 4091

Extract a web document using c#

I am trying to get data from a web page using c#

So far this is my code:

WebBrowser wb = new WebBrowser();

wb.Url = new Uri("http://www.microsoft.com");
HtmlDocument doc = wb.Document;

MessageBox.Show(doc.ToString());

Unfortunately wb remains null and the Url property never gets set.

Can anyone help me please?

Thanks

Upvotes: 0

Views: 4594

Answers (4)

Ramanand Bhat
Ramanand Bhat

Reputation: 183

Firstly you have to call navigate method with URL path and you have to trap Document complete event from the web browser control

 webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);
 webBrowser.Navigate("http:\\www.microsoft.com");

and then use document complete event to get the loaded web page document

 void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {             
          MessageBox.Show(webBrowser.DocumentText.ToString());
    }

Hope this helps...

Upvotes: 1

Paul Sasik
Paul Sasik

Reputation: 81489

i think you're forgetting to initiate the navigation to the page. See here

But also...

Navigation is an async process. You need to put the web browser object at class scope and then handle the navigated event like this:

    private void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
    {
        HtmlDocument doc = webBrowser1.Document;
        doc = webBrowser1.Document;
    }

You'll see that doc will be non null in the handler.

Upvotes: 0

Wayne Hartman
Wayne Hartman

Reputation: 18477

I would use the WebClient class instead of the web browser. The web browser class is more for interaction with a UI, whilst the WebClient is more geared towards programmatic interaction with page. Here is some example code:

private void sendMessage(JaxtrSmsMessage message)
{
    HttpWebRequest request;
    HttpWebResponse response;
    CookieContainer cookies;
    string url = "http://www.jaxtr.com/user/login.jsp";

    try
    {
        request = (HttpWebRequest)WebRequest.Create(url);
        request.AllowAutoRedirect = true;
        request.CookieContainer = new CookieContainer();
        response = (HttpWebResponse)request.GetResponse();
        if (response.StatusCode == HttpStatusCode.OK)
        {
            StringBuilder sb = new StringBuilder();
            StreamReader reader = new StreamReader(response.GetResponseStream());
            while (!reader.EndOfStream)
            {
                sb.AppendLine(reader.ReadLine());
            }

            //Get the hidden value out of the form.                
            String fp = Regex.Match(sb.ToString(), "\"__fp\"\\svalue=\"(([A-Za-z0-9+/=]){4}){1,19}\"", RegexOptions.None).Value;
            fp = fp.Substring(14);
            fp = fp.Replace("\"", String.Empty);


            cookies = request.CookieContainer;
            //response.Close();
            String requestString = "http://www.jaxtr.com/user/Login.action?tzOffset=6&navigateURL=&refPage=&jaxtrId=" + HttpUtility.UrlEncode(credentials.Username) + "&password=" + HttpUtility.UrlEncode(credentials.Password) + "&Login=Login&_sourcePage=%2Flogin.jsp&__fp="+HttpUtility.UrlEncode(fp);
            request = (HttpWebRequest)WebRequest.Create(requestString);
            request.CookieContainer = cookies; //added by myself

            response = (HttpWebResponse)request.GetResponse();
            Console.WriteLine("Response from login:" + response.StatusCode);

            String messageText = (message.TruncateMessage && message.MessageText.Length > JaxtrSmsMessage.MAX_MESSAGE_LENGTH ? message.MessageText.Substring(JaxtrSmsMessage.MAX_MESSAGE_LENGTH) : message.MessageText);

            String messageURL = "http://www.jaxtr.com/user/sendsms?CountryName=" + HttpUtility.UrlEncode(message.CountryName) + "&phone=" + HttpUtility.UrlEncode(message.DestinationPhoneNumber) + "&message=" + HttpUtility.UrlEncode(messageText) + "&bySMS=" + HttpUtility.UrlEncode(message.BySMS.ToString().ToLower());

            request = (HttpWebRequest)WebRequest.Create(messageURL);
            request.CookieContainer = cookies;
            response = (HttpWebResponse)request.GetResponse();

            Console.WriteLine("Response from send SMS command=" + response.StatusCode);

            StringBuilder output = new StringBuilder();

            using (Stream s = response.GetResponseStream())
            {
                StreamReader sr = new StreamReader(s);
                while (!sr.EndOfStream)
                {
                    output.AppendLine(sr.ReadLine());
                }
            }
            response.Close();
        }
        else
        {
            Console.WriteLine("Client was unable to connect!");
        }
    }
    catch (System.Exception e)
    {
        throw new SMSDeliveryException("Unable to deliver SMS message because "+e.Message, e);
    }
}

Upvotes: 5

Ray
Ray

Reputation: 21905

Try the DocumentText property of the HtmlDocument, rather than calling ToString().

Upvotes: -1

Related Questions