Beautiful Soup replaces

Question

I've found the text I want to replace, but when I print soup the format gets changed.

stuff here

becomes <div id="content">stuff here</div>. How can i preserve the data? I have tried print(soup.encode(formatter="none")), but that produces the same incorrect format.

from bs4 import BeautifulSoup

with open(index_file) as fp:
    soup = BeautifulSoup(fp,"html.parser")

found = soup.find("div", {"id": "content"})
found.replace_with(data)

When I print found, I get the correct format:

>>> print(found)
stuff

index_file contents are below:

 
 
    Apples 
 
 

   
    This is the Id of the page

  

     
       stuff here
     
  
 footer should go here

Mad Physicist · Accepted Answer

The found object is not a Python string, it's a Tag that just happens to have a nice string representation. You can verify this by doing

type(found)

A Tag is part of the hierarchy of objects that Beautiful Soup creates for you to be able to interact with the HTML. Another such object is NavigableString. NavigableString is a lot like a string, but it can only contain things that would go into the content portion of the HTML.

When you do

found.replace_with('stuff here')

you are asking the Tag to be replaced with a NavigableString containing that literal text. The only way for HTML to be able to display that string is to escape all the angle brackets, as it's doing.

Instead of that mess, you probably want to keep your Tag, and replace only it's content:

found.string.replace_with('stuff here')

Notice that the correct replacement does not attempt to overwrite the tags.

When you do found.replace_with(...), the object referred to by the name found gets replaced in the parent hierarchy. However, the name found keeps pointing to the same outdated object as before. That is why printing soup shows the update, but printing found does not.

Beautiful Soup replaces < with &lt;

Answers (1)

Related Questions

Beautiful Soup replaces &lt; with &amp;lt;

Answers (1)

Related Questions

Beautiful Soup replaces < with <