Reputation: 11
I am programing in Python 2.7, I am using beautifulsoup4 to extract information from tags of series of documents. However the document has as well strings as:
<!-- PJG ITAG l=90 g=1 f=4 -->
And I want to get rid of them, however I am not an expert on regexps. Can someone help with this please?
Upvotes: 1
Views: 76
Reputation: 55233
Start by loading your HTML in BeautifulSoup:
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(the_html)
Then, remove all the comments:
comments = soup.find_all(text = lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
Upvotes: 3