shan
shan

Reputation: 477

Python - Save back changes using beautifulsoup

I parsing html file using Beautifulsoup and checking whether the text are in uppercase and in that case I am changing it to lower case. When I save the output to new html file, the changes are not being reflected. Can someone point me what i am doing wrong.

def recursiveChildren(x):
    if "childGenerator" in dir(x):
      for child in x.childGenerator():
          name = getattr(child, "name", None)
          if name is not None:
             print(child.name)
          recursiveChildren(child)
    else:
      if not x.isspace():
         print (x)
         if(x.isupper()):
          x.string = x.lower()
          x=x.replace(x,x.string)

if __name__ == "__main__":
    with open("\path\) as fp:
      soup = BeautifulSoup(fp)
    for child in soup.childGenerator():
       recursiveChildren(child)
    html = soup.prettify("utf-8")
    with open("\path\") as file:
      file.write(html)

Upvotes: 1

Views: 928

Answers (1)

Dan-Dev
Dan-Dev

Reputation: 9420

I don't think your way would cope with markup like:

 <p>TEXT<span>More Text<i>TEXT</i>TEXT</span>TEXT</p>

Also the method you want is replaceWith() not replace(). You have not opened your file for writing.

This is the way I would do it.

from bs4 import BeautifulSoup

filename = "test.html"
if __name__ == "__main__":
    # Open the file.
    with open(filename, "r") as fp:
        soup = BeautifulSoup(fp, "html.parser") # Or BeautifulSoup(fp, "lxml")
        # Iterate over all the text found in the document.
        for txt in soup.findAll(text=True):
            # If all the case-based characters (letters) of the string are uppercase.
            if txt.isupper(): 
                # Replace with lowercase.
                txt.replaceWith(txt.lower())
    # Write the file.
    with open(filename, "wb") as file:
        file.write(soup.prettify("utf-8"))

Upvotes: 1

Related Questions