EazyC
EazyC

Reputation: 819

Beautiful Soup: Best ways to comment out a tag instead of extracting it?

I am trying to comment out parts of an HTML page that I want later instead of extracting it with the beautiful soup tag.extract() function. Ex:

<h1> Name of Article </h2> 
<p>First Paragraph I want</p>
<p>More Html I'm interested in</p>
<h2> Subheading in the article I also want </h2>
<p>Even more Html i want blah blah blah</p>
<h2> References </h2> 
<p>Html I want commented out</p>

I want everything below and including the References heading commented out. Obviously I can extract things like so using beautiful soup's extract features:

soup = BeautifulSoup(data, "lxml")

references = soup.find("h2", text=re.compile("References"))
for elm in references.find_next_siblings():
    elm.extract()
references.extract()

I also know that beautiful soup allows a comment creation feature which you can use like so

from bs4 import Comment

commented_tag = Comment(chunk_of_html_parsed_somewhere_else)
soup.append(commented_tag)

This seems very unpythonic and a cumbersome way to simply encapsulate html comment tags directly outside of a specific tag, especially if the tag was located in the middle of a thick html tree. Isn't there some easier way you can just find a tag on beautifulsoup and simply place <!-- --> directly before and after it? Thanks in advance.

Upvotes: 1

Views: 632

Answers (1)

alecxe
alecxe

Reputation: 473833

Assuming I understand the problem correctly, you can use the replace_with() to replace a tag with a Comment instance. This is probably the simplest way to comment an existing tag:

import re

from bs4 import BeautifulSoup, Comment

data = """
<div>
    <h1> Name of Article </h2>
    <p>First Paragraph I want</p>
    <p>More Html I'm interested in</p>
    <h2> Subheading in the article I also want </h2>
    <p>Even more Html i want blah blah blah</p>
    <h2> References </h2>
    <p>Html I want commented out</p>
</div>"""

soup = BeautifulSoup(data, "lxml")
elm = soup.find("h2", text=re.compile("References"))
elm.replace_with(Comment(str(elm)))

print(soup.prettify())

Prints:

<html>
 <body>
  <div>
   <h1>
    Name of Article
   </h1>
   <p>
    First Paragraph I want
   </p>
   <p>
    More Html I'm interested in
   </p>
   <h2>
    Subheading in the article I also want
   </h2>
   <p>
    Even more Html i want blah blah blah
   </p>
   <!--<h2> References </h2>-->
   <p>
    Html I want commented out
   </p>
  </div>
 </body>
</html>

Upvotes: 1

Related Questions