Francis
Francis

Reputation: 857

BeautifulSoup decode_contents() returns < instead of <

What I'm trying to do: I want to merge li tags with class="list_detail" with the previous li

In the code below, the first list_detail works as expected, gives <li>cccc<br/>dddd</li>

The second one gives
<li>aaa<br/>&lt;p&gt;bb&lt;i&gt;b&lt;/i&gt;b&lt;/p&gt;</li>

it should give <li>aaa<br/><p>bb<i>b</i>b</p></li>

from bs4 import BeautifulSoup, Tag, NavigableString
soup = BeautifulSoup("""
<ol>
    <li>cccc</li><li class='list_detail'>dddd</li>
    <li>aaa</li><li class='list_detail'><p>bb<i>b</i>b</p></li>
</ol>""", "html.parser")
details = soup.findAll("li", attrs={'class': "list_detail"})

for detail in details:
    new_div = soup.new_tag("br")
    prev = detail.previous_sibling
    prev.append(new_div)
    prev.append(detail.decode_contents())
    detail2 = prev.next_sibling
    print(f"detail2={detail2}")
    print(f"prev.contents={prev.contents}")
    print("prev=" + str(prev))
    if detail2 is not None:
        detail2.decompose()

print (f"soup={soup}")

An added quirk: when I add a carriage return between <li>cccc</li> and <li class='list_detail'>dddd</li>, I get an error AttributeError: 'NavigableString' object has no attribute 'contents'

Upvotes: 1

Views: 881

Answers (1)

vht981230
vht981230

Reputation: 4498

The line

prev.append(detail.decode_contents())

in your code essentially create a HTML formatted string instead of the string <p>bb<i>b</i>b</p> and append it into the prev tag instead of creating the tag itself. To append <p>bb<i>b</i>b</p> to prev as a tag you can convert to string to bs4.BeautifulSoup object, by changing the line above to

prev.append(BeautifulSoup(detail.decode_contents(), "html.parser"))

Upvotes: 2

Related Questions