Reputation: 1473
Reffering to How can I strip comment tags from HTML using BeautifulSoup? , I am trying to remove the comments from the below Tag
>>> h
<h4 class="col-sm-4"><!-- react-text: 124 -->52 Week High/Low:<!-- /react-text --><b><!-- react-text: 126 --> ₹ <!-- /react-text --><!-- react-text: 127 -->394.00<!-- /react-text --><!-- react-text: 128 --> / ₹ <!-- /react-text --><!-- react-text: 129 -->252.10<!-- /react-text --></b></h4>
My code -
comments = h.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
print h
But the search for comments results in nothing. I want to extract the 2 values - "52 Week High/Low:" and "₹ 394.00 / ₹ 252.10" from the above Tag.
I also tried removing the tags form the entire html using
soup = BeautifulSoup(html)
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
print soup
But the comments are still there.. Any suggestions?
Upvotes: 2
Views: 3241
Reputation: 550
Are you using Python2.7
and BeautifulSoup4
? If not the latter, I would install BeautifulSoup4
.
pip install beautifulsoup4
This following script works for me. I just copied and pasted from your question above and ran it.
from bs4 import BeautifulSoup, Comment
html = """<h4 class="col-sm-4"><!-- react-text: 124 -->52 Week High/Low:<!-- /react-text --><b><!-- react-text: 126 --> ₹ <!-- /react-text --><!-- react-text: 127 -->394.00<!-- /react-text --><!-- react-text: 128 --> / ₹ <!-- /react-text --><!-- react-text: 129 -->252.10<!-- /react-text --></b></h4>"""
soup = BeautifulSoup(html)
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
# nit: It isn't good practice to use a list comprehension only for its
# side-effects. (Wastes space constructing an unused list)
for comment in comments:
comment.extract()
print soup
Note: It's a good thing you posted the
Upvotes: 4