Dany
Dany

Reputation: 4740

How can I remove a NavigableString from the tree?

I am a bit confused: all tags have a decompose() method which allows to remove the tag from the tree in place. But what if I want to remove a NavigableString? It doesn't have such method:

>>> b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
>>> b.p.contents[0]
'aaaa '
>>> type(b.p.contents[0])
<class 'bs4.element.NavigableString'>
>>> b.p.contents[0].decompose()
Traceback (most recent call last):
...
AttributeError: 'NavigableString' object has no attribute 'decompose'

There's a way I managed to somewhat remove the NavigableString from the tree: by removing it from the content list:

>>> b.p.contents.pop(0)
'aaaa '
>>> b
<p><span> bbbbb </span> ccccc</p>

The problem is that it is still present in the strings method response:

>>> list(b.strings)
['aaaa ', ' bbbbb ', ' ccccc']

Which shows that it was wrong way to do. Besides, I am using strings in my code so this hacky solution is not acceptable, alas.


So the question is: how can I remove the specific NavigableString object from the tree?

Upvotes: 8

Views: 2602

Answers (1)

KunduK
KunduK

Reputation: 33384

Use extract() instead of decompose()

extract() removes a tag or string from the tree.

decompose() removes a tag from the tree.

b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
b.p.contents[0].extract()
print(b)

To Know more about it please check following link where you will find more details. BeautifulSoup

Upvotes: 10

Related Questions