JSS
JSS

Reputation: 404

HTML to Text Conversion

Internet Explorer has an option to save a web page as a text file, with all the tags removed. I need a way to batch process that stuff for a project at work. Or there any command line utilities or libraries that can do the same thing for me? COM-interop with IE(not my first choice!)? It doesn't have to format exactly like IE, just give me plain text.

Upvotes: 1

Views: 1158

Answers (3)

SLaks
SLaks

Reputation: 887365

You can do this in C# using the HTML Agility Pack:

var doc = new HtmlWeb.Load(url);
File.WriteAllText(path, doc.DocumentElement.InnerText);

Upvotes: 0

Joey
Joey

Reputation: 354456

I've once seen a script that used lynx for rendering HTML to plain text for automatic generation of a plain text mail from HTML. Not my first choice as well, though.

Upvotes: 0

Matthew Flaschen
Matthew Flaschen

Reputation: 284786

There are many programs that do this. Some are called html2text. There's this one (which isn't available available natively for Windows, but compiles under Cygwin), and another that is for Win32.

Upvotes: 1

Related Questions