user774326
user774326

Reputation: 117

download the html code rendered by asp.net web sites

I have to download and parse a website which is rendered by ASP.NET. If I use the code below I only get half of the page without the rendered "content" that I need. I would like to get the full content that I can see with Firebug or the IE Developer Tool.

How can I do this. I didn#t find a solution.

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
StreamReader streamReader = new StreamReader(response.GetResponseStream());
string code = streamReader.ReadToEnd();

Thank you!

UPDATE

I tried the webcontrol solution. But it didn't work. I have in a WPF Project and use the following code and don't even get the content of a website. I don't see my mistake right now :( .

System.Windows.Forms.WebBrowser webBrowser = new System.Windows.Forms.WebBrowser();
Uri uri = new Uri(myAdress);

webBrowser.AllowNavigation = true;
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
webBrowser.Navigate(uri);

private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            System.Windows.Forms.WebBrowser wb = sender as System.Windows.Forms.WebBrowser;
            string tmp = wb.DocumentText;

        }

UPDATE 2

That's the code I came up with in the meantime. However I don't get any output. My elementCollection doesn't return any values. If I can get the html source as a string I'd be happy and parse it with the HtmlAgilityPack. (I don't want to incoporate the browser into my XMAL code)

Sorry for getting on your nerves!

Thank you!

WebBrowser wb = new WebBrowser();
wb.Source = new Uri(MyURL);        
HTMLDocument doc = (HTMLDocument)wb.Document;
IHTMLElementCollection elementCollection = doc.getElementsByName("body");

    foreach (IHTMLElementCollection element in elementCollection)
    {
        tb.Text = element.toString();
    }

Upvotes: 1

Views: 4480

Answers (6)

Ihtsham Minhas
Ihtsham Minhas

Reputation: 1515

I will recommend you to use following rendering engine instead of the Web Browser

https://github.com/cefsharp/CefSharp

Upvotes: 0

flop_coder
flop_coder

Reputation: 11

You can try this:

public override void Render(HtmlTextWriter writer):
{
    StringBuilder renderedOutput = new StringBuilder();
    Streamwriter  strWriter = new StringWriter(renderedOutput);
    HtmlTextWriter tWriter = new HtmlTextWriter(strWriter);
    base.Render(tWriter);

    string html = tWriter.InnerWriter.ToString();

    string filename = Server.MapPath(".") + "\\data.txt";
    outputStream = new FileStream(filename, FileMode.Create);
    StreamWriter sWriter = new StreamWriter(outputStream);
    sWriter.Write(renderedOutput.ToString());
    sWriter.Flush();

    //render for output
    writer.Write(renderedOutput.ToString());
}

Upvotes: 1

Jan Aagaard
Jan Aagaard

Reputation: 11184

The answer might be that the content of the web site is rendered with JavaScript - probably with some AJAX calls that fetch additional data from the server to build the content. Firebug and IE Developer Tool will show you the rendered html code, but if you choose 'view source', you should see the same same html as the one that you fetch with the code.

I would use a tool like the Fiddler Web Debugger to monitor what the page downloads when it is rendered. You might be able to get the needed content by simulating the AJAX requests that the page makes.

Note that it can be a b*tch to simulate browsing ASP.NET web site if the navigation has been made with post backs, because you will need to include the value of all the form elements (including the hidden view state) when simulation clicks on links.

Upvotes: 2

Jacob
Jacob

Reputation: 78840

Your code should be downloading the entire page. However, the page may, through JavaScript, add content after it's been loaded. Unless you actually run that JavaScript in a web browser, you won't see the entire DOM you see in Firebug.

Upvotes: 1

M4N
M4N

Reputation: 96561

Probably not an answer, but you might use the WebClient class to simplify your code:

WebClient client = new WebClient();
string html = client.DownloadString(URL);

Upvotes: 1

sternr
sternr

Reputation: 6506

If the page you're referring to has IFrames or other dynamic loading mechanisms, the use of HTTPWebRequest would'nt be enough. a better solution would be (if possible) to use a WebBrowser control

Upvotes: 3

Related Questions