wrap the contents of a tag with BeautifulSoup

Question

I'm tring to wrap the contents of a tag with BeautifulSoup. This:


    Footnote 1
    Footnote 2

should become this:


  
    Footnote 1
    Footnote 2

So I use the following code:

footnotes = soup.findAll("div", { "class" : "footnotes" })
footnotes_contents = ''
new_ol = soup.new_tag("ol") 
for content in footnotes[0].children:
    new_tag = soup.new_tag(content)
    new_ol.append(new_tag)

footnotes[0].clear()
footnotes[0].append(new_ol)

print footnotes[0]

but I get the following:

<
    ><Footnote 1
>Footnote 1><
    ><Footnote 2
>Footnote 2><
>

Suggestions?

unutbu · Accepted Answer

Using lxml:

import lxml.html as LH
import lxml.builder as builder
E = builder.E

doc = LH.parse('data')
footnote = doc.find('//div[@class="footnotes"]')
ol = E.ol()
for tag in footnote:
    ol.append(tag)
footnote.append(ol)
print(LH.tostring(doc.getroot()))

prints


    Footnote 1
    Footnote 2

Note that with lxml, an Element (tag) can be in only one place in the tree (since every Element has only one parent), so appending tag to ol also removes it from footnote. So unlike with BeautifulSoup, you do not need to iterate over the contents in reverse order, nor use insert(0,...). You just append in order.

Using BeautifulSoup:

import bs4 as bs
with open('data', 'r') as f:
    soup = bs.BeautifulSoup(f)

footnote = soup.find("div", { "class" : "footnotes" })
new_ol = soup.new_tag("ol")

for content in reversed(footnote.contents):
    new_ol.insert(0, content.extract())

footnote.append(new_ol)
print(soup)

prints


Footnote 1
Footnote 2

wrap the contents of a tag with BeautifulSoup

Answers (2)

Related Questions