mpen
mpen

Reputation: 283293

C# version of HTML Tidy?

I am just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript code). I tried two different HTML Tidy .NET ports and both are throwing exceptions...

Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.


I finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.

private static string FormatHtml(string input)
{
    var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
    using (var sw = new StringWriter())
    using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
    {
        sgml.Read();
        while (!sgml.EOF)
            xw.WriteNode(sgml, true);
    }
    return sw.ToString();
}

Upvotes: 10

Views: 25654

Answers (6)

bh_earth0
bh_earth0

Reputation: 2842

AngleSharp 100% c#

    var parser = new HtmlParser();
    
    var document = parser.ParseDocument("<html><head></head><body><i></i></body></html>");

    var sw = new StringWriter();
    document.ToHtml(sw, new PrettyMarkupFormatter());

    var HTML_prettified = sw.ToString();

edit by sebastian :

 //old parse method
 var document = parser.Parse("<html><head></head><body><i></i></body></html>");

 //new parse method (for AngleSharp 0.16.1): 
 var document = await parser.ParseDocumentAsync(Code); 
 

Upvotes: 18

Saeed As
Saeed As

Reputation: 9

Beautifier provides html I used html-beautify. for example

const beautified = html_beautify("<div><p></p></div>");
console.log(beautified)
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-html.min.js"></script>

Upvotes: 0

educoutinho
educoutinho

Reputation: 929

You can use HtmlAgilityPack (add this package from nuget).

Code sample:

string html = "<div><p>line 1<br>line 2</p><span></div>";
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(description);
var fixedHtml = htmlDoc.DocumentNode.OuterHtml;

Output:

<div><p>line 1<br />line 2</p><span></span></div>

Upvotes: 1

wonea
wonea

Reputation: 4969

The latest C# wrapper for HTML Tidy was done by Mark Beaton, which seems rather more up-to-date than the links you've referenced (2003). Also worth of note is that Mark provides executables for referencing as well, rather than pulling them from the official site. That should do the trick of nicely organising and validating your HTML.

Upvotes: 10

Nick Martyshchenko
Nick Martyshchenko

Reputation: 4249

UPDATE:

Check HtmlTextWriter or XhtmlTextWriter, usage: Formatting Html Output with HtmlTextWriter, maybe HTML construction via HtmlTextWriter will be better?

Also check : LINQ & Lambda, Part 3: Html Agility Pack to LINQ to XML Converter

http://www.manoli.net/csharpformat/, here source code in case you miss it.


Maybe you want to do it yourself? This project can be helpful: Html Agility Pack

What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature

Sample applications:

  • Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, well... you name it.

  • Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.

  • Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.


Also you can try this implementation: A managed wrapper for the HTML Tidy library

Upvotes: 1

Abe Miessler
Abe Miessler

Reputation: 85126

I've used SGML Reader to convert HTML to XHTML in the past. Might be worth looking into...

I never had any problems with it when I was using it.

Upvotes: 1

Related Questions