Reputation: 4740
I am a bit confused: all tags have a decompose()
method which allows to remove the tag from the tree in place. But what if I want to remove a NavigableString
? It doesn't have such method:
>>> b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
>>> b.p.contents[0]
'aaaa '
>>> type(b.p.contents[0])
<class 'bs4.element.NavigableString'>
>>> b.p.contents[0].decompose()
Traceback (most recent call last):
...
AttributeError: 'NavigableString' object has no attribute 'decompose'
There's a way I managed to somewhat remove the NavigableString
from the tree: by removing it from the content list:
>>> b.p.contents.pop(0)
'aaaa '
>>> b
<p><span> bbbbb </span> ccccc</p>
The problem is that it is still present in the strings
method response:
>>> list(b.strings)
['aaaa ', ' bbbbb ', ' ccccc']
Which shows that it was wrong way to do. Besides, I am using strings
in my code so this hacky solution is not acceptable, alas.
So the question is: how can I remove the specific NavigableString
object from the tree?
Upvotes: 8
Views: 2602
Reputation: 33384
Use extract()
instead of decompose()
extract()
removes a tag or string from the tree.
decompose()
removes a tag from the tree.
b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
b.p.contents[0].extract()
print(b)
To Know more about it please check following link where you will find more details. BeautifulSoup
Upvotes: 10