vaab
vaab

Reputation: 10122

looking to convert html to ascii text (ansi possible) in python

I've trouble finding a library to convert simple HTML (with <b>, <i>, <p>, <li> ...) to a simple representation. Obviously this can't match HTML spec very far, but I don't need fancy things. For instance lynx is good for the task (except bold and italic are ignored and could probably be translated in some ANSI attributes):

$ echo "<b>hello</b> <p>this is a <i>list</i> <ul><li>foo</li><li>bar</li></ul></p>" |
    lynx -stdin  -dump
hello

this is a list
  * foo
  * bar

The ideal solution would be a python library. Otherwise I will stick to use lynx... So any command better than the one I've proposed here would also be accepted.

Upvotes: 2

Views: 489

Answers (1)

vaab
vaab

Reputation: 10122

There is html2text which is not quite what wanted, but could match some other viewers constraints.

It produces text from html. This text is following Markdown format. So there are no use of ANSI attributes for instance. However, as Markdown is meant to be a visual text-only format, it can satisfy probably some needs.

Upvotes: 1

Related Questions