Reputation: 477
I parsing html file using Beautifulsoup and checking whether the text are in uppercase and in that case I am changing it to lower case. When I save the output to new html file, the changes are not being reflected. Can someone point me what i am doing wrong.
def recursiveChildren(x):
if "childGenerator" in dir(x):
for child in x.childGenerator():
name = getattr(child, "name", None)
if name is not None:
print(child.name)
recursiveChildren(child)
else:
if not x.isspace():
print (x)
if(x.isupper()):
x.string = x.lower()
x=x.replace(x,x.string)
if __name__ == "__main__":
with open("\path\) as fp:
soup = BeautifulSoup(fp)
for child in soup.childGenerator():
recursiveChildren(child)
html = soup.prettify("utf-8")
with open("\path\") as file:
file.write(html)
Upvotes: 1
Views: 928
Reputation: 9420
I don't think your way would cope with markup like:
<p>TEXT<span>More Text<i>TEXT</i>TEXT</span>TEXT</p>
Also the method you want is replaceWith() not replace(). You have not opened your file for writing.
This is the way I would do it.
from bs4 import BeautifulSoup
filename = "test.html"
if __name__ == "__main__":
# Open the file.
with open(filename, "r") as fp:
soup = BeautifulSoup(fp, "html.parser") # Or BeautifulSoup(fp, "lxml")
# Iterate over all the text found in the document.
for txt in soup.findAll(text=True):
# If all the case-based characters (letters) of the string are uppercase.
if txt.isupper():
# Replace with lowercase.
txt.replaceWith(txt.lower())
# Write the file.
with open(filename, "wb") as file:
file.write(soup.prettify("utf-8"))
Upvotes: 1