Reputation: 19570
I'm revisiting som old code of mine and have stumbled upon a method for getting the title of a website based on its url. It's not really what you would call a stable method as it often fails to produce a result and sometimes even produces incorrect results. Also, sometimes it fails to show some of the characters from the title as they are of an alternative encoding.
Does anyone have suggestions for improvements over this old version?
public static string SuggestTitle(string url, int timeout)
{
WebResponse response = null;
string line = string.Empty;
try
{
WebRequest request = WebRequest.Create(url);
request.Timeout = timeout;
response = request.GetResponse();
Stream streamReceive = response.GetResponseStream();
Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
StreamReader streamRead = new System.IO.StreamReader(streamReceive, encoding);
while(streamRead.EndOfStream != true)
{
line = streamRead.ReadLine();
if (line.Contains("<title>"))
{
line = line.Split(new char[] { '<', '>' })[2];
break;
}
}
}
catch (Exception) { }
finally
{
if (response != null)
{
response.Close();
}
}
return line;
}
One final note - I would like the code to run faster as well, as it is blocking until the page as been fetched, so if I can get only the site header and not the entire page, it would be great.
Upvotes: 18
Views: 29296
Reputation: 31845
A simpler way to get the content:
WebClient x = new WebClient();
string source = x.DownloadString("http://www.singingeels.com/");
A simpler, more reliable way to get the title:
string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
RegexOptions.IgnoreCase).Groups["Title"].Value;
Upvotes: 52
Reputation: 582
Perhaps with this suggestion a new world opens up for you I also had this question and came to this
Download "Html Agility Pack" from http://html-agility-pack.net/?z=codeplex
Or go to nuget: https://www.nuget.org/packages/HtmlAgilityPack/ And add in this reference.
Add folow using in the code file:
using HtmlAgilityPack;
Write folowing code in your methode:
var webGet = new HtmlWeb();
var document = webGet.Load(url);
var title = document.DocumentNode.SelectSingleNode("html/head/title").InnerText;
Sources:
https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/ HtmlAgilityPack obtain Title and meta
Upvotes: 10
Reputation: 54854
Inorder to accomplish this you are going to need to do a couple of things.
I have done this before with SEO bots and I have been able to handle almost 10,000 requests at a single time. You just need to make sure that each web request can be self contained in a thread.
Upvotes: -1