Reputation: 3152
I've wasted 2 days to find out, that there's a known memory leak in WebBrowser control(since 2007 or so and still, they havent fixed it) so I've decided to simply ask here, how to do the thing I need.
Till now, (using WebBrowser...), I've been visiting a site, (ctrl+a), paste it to a string and that was all. I had text content of a web page in my string. Worked perfectly untill I found out that it takes 1 gb of memory after some time. Is it possible to do that through HttpWebRequest, httpwebclient or anything?
Thanks for replies, there wasn't any thread like that (or I havent found any, searching didnt really take me much coz Im really pissed off now :P)
FORGOT TO ADD: I don't want HTML code, I know it's possible to get it easily. In my case, html code is useless. I do need the text user see while opening the page with internet browser.
Upvotes: 0
Views: 1965
Reputation: 2534
Why don't you use the free open source HTML scraper like Ncrawler.
It is written in c#.
You can get examples on how to use it here.
Upvotes: 1
Reputation: 10994
You can use this:
string getHtml(string url) {
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "GET";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader source = new StreamReader(myWebResponse.GetResponseStream());
string pageSourceStr = string.Empty;
pageSourceStr= source.ReadToEnd();
response.Close();
return pageSourceStr;
}
You still have to do some substring replacement to reduce it from html to text. It's not too bad if you just want text from a certain div.
Upvotes: 2
Reputation: 7692
This will download the html content from any webpage.
WebClient client = new WebClient ();
string reply = client.DownloadString ("http://www.google.com");
Upvotes: 2
Reputation: 116108
using (WebClient client = new WebClient())
{
string html = client.DownloadString("http://stackoverflow.com/questions/10839877/how-to-get-a-txt-content-of-a-web-page");
}
Upvotes: 7