Prabodha Eranga
Prabodha Eranga

Reputation: 11

read html from C# win forms

I need to read title of the web site using C# win forms.so what is the best way to do it.i search on the google but i didnt get anyone.

thanks in advance

Upvotes: 1

Views: 3024

Answers (3)

peterthegreat
peterthegreat

Reputation: 406

You want to use the WebClient object found in the System.Net.WebClient namespace.

using System.Net;

With WebClient you can download a whole website as a string and then do whatever you want with that string. :)

Example:

WebClient client = new WebClient(); 
string content = wc.DownloadString("http://www.google.com");

Then just parse the string anyway you want it. :) In this example you might want to find the title element and extract the title like this:

string title = content.Substring(content.IndexOf("<title>"), content.IndexOf("</title>") - content.IndexOf("<title>")).Replace("<title>", "").Trim();

Hope it helps. :)

Upvotes: 4

Darin Dimitrov
Darin Dimitrov

Reputation: 1038730

Personally I like and use SgmlReader to parse HTML:

using System;
using System.IO;
using System.Net;
using System.Xml;
using Sgml;

class Program
{
    static void Main()
    {
        var url = "http://www.stackoverflow.com";
        using (var reader = new SgmlReader())
        using (var client = new WebClient())
        using (var streamReader = new StreamReader(client.OpenRead(url)))
        {
            reader.DocType = "HTML";
            reader.WhitespaceHandling = WhitespaceHandling.All;
            reader.CaseFolding = Sgml.CaseFolding.ToLower;
            reader.InputStream = streamReader;

            var doc = new XmlDocument();
            doc.PreserveWhitespace = true;
            doc.XmlResolver = null;
            doc.Load(reader);
            var title = doc.SelectSingleNode("//title");
            if (title != null)
            {
                Console.WriteLine(title.InnerText);
            }
        }
    }
}

Upvotes: 0

Shekhar_Pro
Shekhar_Pro

Reputation: 18420

If you have to do whole webpage parsing then you can try HTML Agility pack. If what you need is just the Title then some Regular Expression will do it.

Since most of the Time Title is in <title> tag you can straight away extract that.

For downloading the HTML then you can use a WebClient or HttpRequest/Response objects

Upvotes: 0

Related Questions