LeafStorm
LeafStorm

Reputation: 3127

How do I generate a table of contents for HTML text in Python?

Assume that I have some HTML code, like this (generated from Markdown or Textile or something):

<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p>
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->

How could I generate a table of contents for it using Python?

Upvotes: 1

Views: 1467

Answers (2)

kloffy
kloffy

Reputation: 2928

Here's an example using lxml and xpath.

from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
    print node.tag, node.text

Upvotes: 3

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799430

Use an HTML parser such as lxml or BeautifulSoup to find all header elements.

Upvotes: 6

Related Questions