Reputation: 404
Internet Explorer has an option to save a web page as a text file, with all the tags removed. I need a way to batch process that stuff for a project at work. Or there any command line utilities or libraries that can do the same thing for me? COM-interop with IE(not my first choice!)? It doesn't have to format exactly like IE, just give me plain text.
Upvotes: 1
Views: 1158
Reputation: 887365
You can do this in C# using the HTML Agility Pack:
var doc = new HtmlWeb.Load(url);
File.WriteAllText(path, doc.DocumentElement.InnerText);
Upvotes: 0
Reputation: 354456
I've once seen a script that used lynx
for rendering HTML to plain text for automatic generation of a plain text mail from HTML. Not my first choice as well, though.
Upvotes: 0
Reputation: 284786
There are many programs that do this. Some are called html2text. There's this one (which isn't available available natively for Windows, but compiles under Cygwin), and another that is for Win32.
Upvotes: 1