Reputation: 3127
Assume that I have some HTML code, like this (generated from Markdown or Textile or something):
<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p>
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->
How could I generate a table of contents for it using Python?
Upvotes: 1
Views: 1467
Reputation: 2928
Here's an example using lxml and xpath.
from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
print node.tag, node.text
Upvotes: 3
Reputation: 799430
Use an HTML parser such as lxml or BeautifulSoup to find all header elements.
Upvotes: 6