Reputation: 819
I am trying to comment out parts of an HTML page that I want later instead of extracting it with the beautiful soup tag.extract() function. Ex:
<h1> Name of Article </h2>
<p>First Paragraph I want</p>
<p>More Html I'm interested in</p>
<h2> Subheading in the article I also want </h2>
<p>Even more Html i want blah blah blah</p>
<h2> References </h2>
<p>Html I want commented out</p>
I want everything below and including the References heading commented out. Obviously I can extract things like so using beautiful soup's extract features:
soup = BeautifulSoup(data, "lxml")
references = soup.find("h2", text=re.compile("References"))
for elm in references.find_next_siblings():
elm.extract()
references.extract()
I also know that beautiful soup allows a comment creation feature which you can use like so
from bs4 import Comment
commented_tag = Comment(chunk_of_html_parsed_somewhere_else)
soup.append(commented_tag)
This seems very unpythonic and a cumbersome way to simply encapsulate html comment tags directly outside of a specific tag, especially if the tag was located in the middle of a thick html tree. Isn't there some easier way you can just find a tag on beautifulsoup and simply place <!-- -->
directly before and after it? Thanks in advance.
Upvotes: 1
Views: 632
Reputation: 473833
Assuming I understand the problem correctly, you can use the replace_with()
to replace a tag with a Comment
instance. This is probably the simplest way to comment an existing tag:
import re
from bs4 import BeautifulSoup, Comment
data = """
<div>
<h1> Name of Article </h2>
<p>First Paragraph I want</p>
<p>More Html I'm interested in</p>
<h2> Subheading in the article I also want </h2>
<p>Even more Html i want blah blah blah</p>
<h2> References </h2>
<p>Html I want commented out</p>
</div>"""
soup = BeautifulSoup(data, "lxml")
elm = soup.find("h2", text=re.compile("References"))
elm.replace_with(Comment(str(elm)))
print(soup.prettify())
Prints:
<html>
<body>
<div>
<h1>
Name of Article
</h1>
<p>
First Paragraph I want
</p>
<p>
More Html I'm interested in
</p>
<h2>
Subheading in the article I also want
</h2>
<p>
Even more Html i want blah blah blah
</p>
<!--<h2> References </h2>-->
<p>
Html I want commented out
</p>
</div>
</body>
</html>
Upvotes: 1